minimum spanning tree
algorithm design
graph theory
computer science
combinatorial optimization

Algorithm to find minimum spanning tree of chosen vertices

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

This problem sounds like a small variation of a minimum spanning tree, but there are actually two different versions. The right algorithm depends on whether the final tree may use only the chosen vertices or whether it may also include extra helper vertices from the original graph.

Clarify the Problem First

Suppose you have a weighted graph and a terminal set S. You want the cheapest tree that connects every vertex in S.

There are two interpretations:

  1. Only the chosen vertices may appear in the answer.
  2. Extra vertices are allowed if they help connect the chosen ones more cheaply.

The first case is a standard MST problem on a smaller graph. The second case is the Steiner tree problem, which is much harder.

Case 1: Only the Chosen Vertices May Appear

If the output tree must contain only the chosen vertices, the algorithm is simple:

  1. Build the subgraph induced by the chosen vertices.
  2. Check that the induced subgraph is connected.
  3. Run any MST algorithm, usually Kruskal or Prim, on that subgraph.

This gives the exact optimum because you have reduced the problem to an ordinary MST.

Kruskal is often the easiest implementation when you already have an edge list. Sort the edges by weight, then keep adding the cheapest edge that connects two different components.

python
1class DSU:
2    def __init__(self, items):
3        self.parent = {item: item for item in items}
4
5    def find(self, x):
6        while self.parent[x] != x:
7            self.parent[x] = self.parent[self.parent[x]]
8            x = self.parent[x]
9        return x
10
11    def union(self, a, b):
12        ra = self.find(a)
13        rb = self.find(b)
14        if ra == rb:
15            return False
16        self.parent[rb] = ra
17        return True
18
19
20def mst_on_chosen_vertices(vertices, edges):
21    dsu = DSU(vertices)
22    tree = []
23    total = 0
24
25    for w, u, v in sorted(edges):
26        if u in vertices and v in vertices and dsu.union(u, v):
27            tree.append((u, v, w))
28            total += w
29
30    if len(tree) != len(vertices) - 1:
31        raise ValueError("Chosen vertices are not connected in the induced subgraph")
32
33    return total, tree
34
35
36chosen = {"A", "C", "D", "E"}
37edges = [
38    (4, "A", "C"),
39    (2, "C", "D"),
40    (3, "D", "E"),
41    (7, "A", "E"),
42    (8, "A", "D"),
43]
44
45cost, tree = mst_on_chosen_vertices(chosen, edges)
46print(cost)
47print(tree)

The running time is dominated by sorting, so it is O(E log E) for the induced subgraph.

Case 2: Extra Vertices Are Allowed

If you are allowed to use non-terminal vertices as connectors, this is no longer a plain MST. It becomes the Steiner tree problem on graphs.

That matters because the best solution may include vertices that were not requested. For example, connecting terminals A, D, and F through a cheap hub X can be less expensive than using direct edges only among the terminals.

There is no known polynomial-time algorithm for the exact general case. Practical options are:

  • Use an exact exponential algorithm if the terminal set is small.
  • Use an approximation algorithm for larger graphs.
  • Use a library implementation if this is an engineering task rather than an algorithms exercise.

One common approximation is:

  1. Compute shortest-path distances between all terminals.
  2. Build the metric closure on the terminals.
  3. Compute an MST on that complete terminal graph.
  4. Expand each MST edge back to its original shortest path.
  5. Prune leaves that are not terminals.

This is much better than running MST directly on the original graph without understanding the terminal-only constraint.

How to Decide Which Algorithm You Need

Ask one question: may the result include non-chosen vertices?

If the answer is no, induced-subgraph MST is exact and efficient.

If the answer is yes, call the problem Steiner tree. That naming is important because it tells you immediately that a simple tweak to Prim or Kruskal will not solve the general case exactly.

In interviews and code reviews, many incorrect solutions come from mixing these two problems together.

Common Pitfalls

The biggest mistake is assuming that "MST of chosen vertices" always means the chosen vertices alone. In many real problems, you are allowed to route through intermediate vertices, which changes the problem class entirely.

Another mistake is stopping Kruskal as soon as all terminals become connected while still allowing arbitrary extra vertices. That can produce a connected subgraph, but it is not an exact Steiner algorithm.

People also forget connectivity checks. If the chosen vertices are disconnected in the induced subgraph, there is no valid MST for case 1.

Finally, be careful with weighted shortest paths in the Steiner-style approximation. You need shortest-path distances between terminals, not just direct edges from the input graph.

Summary

  • If only the chosen vertices may appear, build the induced subgraph and run a normal MST algorithm.
  • If extra helper vertices may appear, the problem is a Steiner tree problem, not a standard MST.
  • Kruskal or Prim is exact for the induced-subgraph version.
  • Metric-closure plus MST is a common approximation when Steiner vertices are allowed.
  • Always clarify the problem statement before choosing the algorithm.

Course illustration
Course illustration

All Rights Reserved.