Tree Structures
Algorithm Design
Data Structures
Computational Complexity
Tree Isomorphism

Calculate minimal operations to make two tree structures identical

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

The standard way to frame this problem is as a tree edit-distance problem: what is the minimum number of insert, delete, and relabel operations needed to transform one tree into another? Once you say it that way, the task stops being vague and becomes a dynamic-programming problem on rooted subtrees. The exact algorithm depends on whether the trees are ordered or unordered, because child ordering changes the matching problem significantly.

Define The Edit Operations Clearly

Before writing code, decide what counts as an operation. A common model is:

  • insert a node
  • delete a node
  • relabel a node

If node labels match, relabel cost is 0; otherwise it is usually 1. For ordered trees, child position matters, so you compare child sequences in order. For unordered trees, the problem is harder because you also have to solve an assignment problem between children.

This article sticks to rooted ordered trees, which is the usual tractable version for implementation.

A Simple Tree Representation

Here is a small Python representation for ordered trees.

python
1class Node:
2    def __init__(self, value, children=None):
3        self.value = value
4        self.children = children or []

The edit distance between two trees then becomes a recursive cost on node labels plus a sequence-edit cost on the child lists.

Core Dynamic-Programming Idea

Suppose you want the edit distance between two nodes a and b.

  1. pay 0 if a.value == b.value, otherwise pay 1 for relabeling
  2. compute the minimum cost to transform the ordered list a.children into b.children
  3. each child-to-child comparison recursively uses the same tree-distance function

The child-list part looks like classic sequence edit distance, except the substitution cost is itself a subtree-edit distance.

Runnable Python Example

python
1from functools import lru_cache
2
3class Node:
4    def __init__(self, value, children=None):
5        self.value = value
6        self.children = children or []
7
8
9def subtree_size(node):
10    return 1 + sum(subtree_size(child) for child in node.children)
11
12
13def tree_edit_distance(a, b):
14    @lru_cache(maxsize=None)
15    def dist(id_a, id_b):
16        node_a = nodes_a[id_a]
17        node_b = nodes_b[id_b]
18
19        relabel = 0 if node_a.value == node_b.value else 1
20        children_a = node_a.children
21        children_b = node_b.children
22
23        m, n = len(children_a), len(children_b)
24        dp = [[0] * (n + 1) for _ in range(m + 1)]
25
26        for i in range(1, m + 1):
27            dp[i][0] = dp[i - 1][0] + subtree_size(children_a[i - 1])
28        for j in range(1, n + 1):
29            dp[0][j] = dp[0][j - 1] + subtree_size(children_b[j - 1])
30
31        for i in range(1, m + 1):
32            for j in range(1, n + 1):
33                delete_cost = dp[i - 1][j] + subtree_size(children_a[i - 1])
34                insert_cost = dp[i][j - 1] + subtree_size(children_b[j - 1])
35                match_cost = dp[i - 1][j - 1] + dist(id(children_a[i - 1]), id(children_b[j - 1]))
36                dp[i][j] = min(delete_cost, insert_cost, match_cost)
37
38        return relabel + dp[m][n]
39
40    def collect(root):
41        out = {}
42        stack = [root]
43        while stack:
44            node = stack.pop()
45            out[id(node)] = node
46            stack.extend(node.children)
47        return out
48
49    nodes_a = collect(a)
50    nodes_b = collect(b)
51    return dist(id(a), id(b))
52
53
54t1 = Node("A", [Node("B"), Node("C")])
55t2 = Node("A", [Node("B"), Node("D")])
56print(tree_edit_distance(t1, t2))

This example uses subtree size as the cost of deleting or inserting an entire subtree. That is a simple and defensible cost model for many applications.

Why Ordered Versus Unordered Matters

If child order matters, the DP above is natural. If child order does not matter, you cannot just compare children position by position. You need to find the cheapest pairing between children, which makes the problem more expensive and more complex.

That distinction is why two tree problems that sound similar can have very different algorithmic difficulty.

Use Cases

This kind of edit distance appears in:

  • comparing parsed syntax trees
  • UI state diffing
  • structured document comparison
  • data synchronization and migration tools
  • phylogenetic or hierarchical structure analysis

The exact cost model may vary, but the recurrence structure stays similar.

Common Pitfalls

  • Skipping the definition of allowed operations and then discovering later that two teams meant different problems.
  • Applying an ordered-tree algorithm to trees where child order is semantically irrelevant.
  • Forgetting that insert and delete costs often apply to entire subtrees, not just to one node label.
  • Writing a purely recursive solution without memoization and then hitting exponential blowups.
  • Assuming tree isomorphism and tree edit distance are the same problem. They are related but not identical.

Summary

  • The minimal-operations version of this problem is usually tree edit distance.
  • For rooted ordered trees, dynamic programming on child sequences is the standard approach.
  • Node relabeling cost combines with insert and delete costs on subtrees.
  • Ordered and unordered trees require different algorithms.
  • Memoization is essential if you want the recursive formulation to remain practical.

Course illustration
Course illustration

All Rights Reserved.