Calculate minimal operations to make two tree structures identical
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
The standard way to frame this problem is as a tree edit-distance problem: what is the minimum number of insert, delete, and relabel operations needed to transform one tree into another? Once you say it that way, the task stops being vague and becomes a dynamic-programming problem on rooted subtrees. The exact algorithm depends on whether the trees are ordered or unordered, because child ordering changes the matching problem significantly.
Define The Edit Operations Clearly
Before writing code, decide what counts as an operation. A common model is:
- insert a node
- delete a node
- relabel a node
If node labels match, relabel cost is 0; otherwise it is usually 1. For ordered trees, child position matters, so you compare child sequences in order. For unordered trees, the problem is harder because you also have to solve an assignment problem between children.
This article sticks to rooted ordered trees, which is the usual tractable version for implementation.
A Simple Tree Representation
Here is a small Python representation for ordered trees.
The edit distance between two trees then becomes a recursive cost on node labels plus a sequence-edit cost on the child lists.
Core Dynamic-Programming Idea
Suppose you want the edit distance between two nodes a and b.
- pay
0ifa.value == b.value, otherwise pay1for relabeling - compute the minimum cost to transform the ordered list
a.childrenintob.children - each child-to-child comparison recursively uses the same tree-distance function
The child-list part looks like classic sequence edit distance, except the substitution cost is itself a subtree-edit distance.
Runnable Python Example
This example uses subtree size as the cost of deleting or inserting an entire subtree. That is a simple and defensible cost model for many applications.
Why Ordered Versus Unordered Matters
If child order matters, the DP above is natural. If child order does not matter, you cannot just compare children position by position. You need to find the cheapest pairing between children, which makes the problem more expensive and more complex.
That distinction is why two tree problems that sound similar can have very different algorithmic difficulty.
Use Cases
This kind of edit distance appears in:
- comparing parsed syntax trees
- UI state diffing
- structured document comparison
- data synchronization and migration tools
- phylogenetic or hierarchical structure analysis
The exact cost model may vary, but the recurrence structure stays similar.
Common Pitfalls
- Skipping the definition of allowed operations and then discovering later that two teams meant different problems.
- Applying an ordered-tree algorithm to trees where child order is semantically irrelevant.
- Forgetting that insert and delete costs often apply to entire subtrees, not just to one node label.
- Writing a purely recursive solution without memoization and then hitting exponential blowups.
- Assuming tree isomorphism and tree edit distance are the same problem. They are related but not identical.
Summary
- The minimal-operations version of this problem is usually tree edit distance.
- For rooted ordered trees, dynamic programming on child sequences is the standard approach.
- Node relabeling cost combines with insert and delete costs on subtrees.
- Ordered and unordered trees require different algorithms.
- Memoization is essential if you want the recursive formulation to remain practical.

