minimum spanning tree
graph theory
vertex subset
spanning tree optimization
algorithm design

Construct a minimum spanning tree covering a specific subset of the vertices

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

If you only need to connect a specific subset of vertices, the problem is no longer an ordinary minimum spanning tree problem. It is a Steiner tree problem: connect the required terminal vertices as cheaply as possible, optionally using extra non-terminal vertices when they help reduce total cost.

Why this is not just "run MST on the whole graph"

A minimum spanning tree connects every vertex in the graph. Your problem is different because only some vertices are mandatory.

That changes the optimization goal:

  • ordinary MST: span all vertices
  • Steiner tree: span only the required terminals, while allowing helpful intermediate vertices

Those extra intermediate vertices are called Steiner vertices. They can appear in the optimal solution even though they are not part of the required subset.

The exact problem is NP-hard

For a general weighted graph, the Steiner tree problem is NP-hard. That means you should not expect a simple Kruskal-or-Prim-style exact algorithm that stays efficient on large arbitrary inputs.

So the real engineering question is usually:

  • do you need an exact answer for a small instance
  • or a good approximation for a larger one

That distinction matters much more than the phrase "minimum spanning tree" in the title.

A common approximation: metric closure on the terminal set

One standard approximation approach is:

  1. compute shortest-path distances between all terminal vertices
  2. build a complete graph on the terminals using those distances
  3. compute an MST on that complete terminal graph
  4. expand those MST edges back into shortest paths in the original graph
  5. prune unnecessary non-terminal leaves

This gives a reasonable approximation and is much more practical than exact exponential search on larger inputs.

Here is a small NetworkX sketch:

python
1import networkx as nx
2
3G = nx.Graph()
4G.add_weighted_edges_from([
5    ("A", "B", 1),
6    ("B", "C", 1),
7    ("C", "D", 1),
8    ("A", "D", 4),
9    ("B", "E", 2),
10    ("D", "E", 2),
11])
12
13terminals = ["A", "D", "E"]
14
15T = nx.algorithms.approximation.steiner_tree(G, terminals, weight="weight")
16print(T.edges(data=True))

That uses a built-in approximation instead of hand-implementing every step.

Pruning an all-vertex MST is not generally optimal

A tempting shortcut is:

  • build the MST of the whole graph
  • repeatedly remove leaves that are not required terminals

This can produce a valid tree covering the subset, but it is not generally optimal for the Steiner problem. The globally cheapest all-vertex spanning tree may force you into worse terminal-connection choices than a Steiner-aware construction would.

So pruning an MST is a heuristic, not the formal solution to the general subset problem.

Exact solutions are realistic only for small instances

If the graph or terminal set is small, exact methods such as dynamic programming over subsets or integer programming can be feasible. Those are useful when:

  • the instance size is limited
  • exact optimality matters
  • approximation error is unacceptable

But once the graph gets large, exact Steiner methods become expensive quickly.

Choose the algorithm by problem size and guarantees

A practical rule is:

  • use exact Steiner methods for small instances with strong optimality requirements
  • use approximation algorithms for larger instances
  • use MST-pruning only when a heuristic is acceptable

The important conceptual fix is recognizing that "MST over a subset" is really a different problem class.

Common Pitfalls

  • Assuming an ordinary MST algorithm directly solves the subset-terminal problem.
  • Forgetting that extra non-terminal vertices may appear in the optimal tree.
  • Using whole-graph MST pruning and assuming the result is exact.
  • Asking for exact optimality on large graphs without considering NP-hardness.
  • Ignoring shortest-path structure between required terminals when designing an approximation.

Summary

  • Connecting only a required subset of vertices is usually a Steiner tree problem, not a standard MST problem.
  • The exact Steiner problem is NP-hard on general graphs.
  • A common practical approach is to approximate via terminal distances and an MST on the terminal metric closure.
  • Pruning a full-graph MST can work as a heuristic, but it is not generally optimal.
  • The right method depends on graph size and whether you need an exact answer or a good approximation.

Course illustration
Course illustration

All Rights Reserved.