Algorithm to find lowest common ancestor in directed acyclic graph?

Directed Acyclic Graph

Lowest Common Ancestor

Graph Algorithms

Computer Science

Data Structures

Algorithm to find lowest common ancestor in directed acyclic graph?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Finding a lowest common ancestor in a tree is straightforward because every node has exactly one parent. In a directed acyclic graph, that assumption disappears, so the problem needs a more careful definition before you even choose an algorithm.

For a DAG, two nodes can have multiple common ancestors, and there may be several equally low answers. In practice, the usual goal is to return all common ancestors that are not ancestors of any other common ancestor.

Define "Lowest" Before Writing Code

Let u and v be two nodes in a DAG. A node x is a common ancestor if there is a directed path from x to u and also from x to v.

In a tree, the lowest common ancestor is unique. In a DAG, uniqueness is not guaranteed. Imagine this shape:

'A points to B and C'
'B points to D'
'C points to E'
'D and E both point to F'

Depending on which two target nodes you choose, you may end up with multiple incomparable common ancestors. That is why many DAG algorithms return a set rather than a single node.

The most useful definition is:

Find all common ancestors of u and v.
Remove any ancestor that can reach another common ancestor.
The remaining nodes are the lowest common ancestors.

A Practical Algorithm

For a one-off query, a clean approach is:

Build the reverse graph so you can walk from a node to its parents.
Collect all ancestors of u.
Collect all ancestors of v.
Intersect the two ancestor sets.
From that intersection, keep only the nodes that do not reach any other node in the same intersection.

The first four steps are easy. The last step is where the DAG-specific reasoning happens. If common ancestor x can reach common ancestor y, then x is higher, not lower, so x should be removed.

Python Example

The example below represents edges as parent-to-child links and returns the set of lowest common ancestors for a pair of nodes:

python

1from collections import defaultdict, deque
2
3
4def build_reverse_graph(graph):
5    parents = defaultdict(list)
6    for parent, children in graph.items():
7        for child in children:
8            parents[child].append(parent)
9    return parents
10
11
12def collect_ancestors(start, parents):
13    seen = set()
14    stack = [start]
15
16    while stack:
17        node = stack.pop()
18        for parent in parents[node]:
19            if parent not in seen:
20                seen.add(parent)
21                stack.append(parent)
22
23    return seen
24
25
26def can_reach(graph, source, target):
27    queue = deque([source])
28    seen = {source}
29
30    while queue:
31        node = queue.popleft()
32        if node == target:
33            return True
34        for child in graph.get(node, []):
35            if child not in seen:
36                seen.add(child)
37                queue.append(child)
38
39    return False
40
41
42def lowest_common_ancestors(graph, u, v):
43    parents = build_reverse_graph(graph)
44    common = collect_ancestors(u, parents) & collect_ancestors(v, parents)
45
46    result = set(common)
47    for a in common:
48        for b in common:
49            if a != b and can_reach(graph, a, b):
50                result.discard(a)
51                break
52
53    return result
54
55
56graph = {
57    "A": ["B", "C"],
58    "B": ["D", "E"],
59    "C": ["E"],
60    "D": ["F"],
61    "E": ["F", "G"],
62}
63
64print(lowest_common_ancestors(graph, "F", "G"))

Output:

text

{'E'}

In this graph, both A and B are also ancestors of F and G, but E is lower because it sits closer to both targets.

Improving Query Performance

The implementation above is easy to understand, but it recomputes reachability during the filtering step. That is acceptable for small graphs or occasional queries, but not for a workload with many LCA lookups.

For repeated queries, common optimizations are:

precompute ancestor bitsets in topological order
precompute transitive closure when the graph is small enough
memoize reachability results
reduce the graph to a relevant subgraph before filtering

If the DAG is fixed and queries are frequent, preprocessing usually pays for itself. If the graph changes often, the simpler per-query traversal may still be the better engineering choice.

Complexity Discussion

Let V be the number of vertices and E the number of edges.

Collecting ancestors of each target takes O(V + E) in the worst case. The expensive part in the simple implementation is the pairwise reachability test inside the common ancestor set. If that set is large, the total cost can grow noticeably.

That does not mean the approach is wrong. It means the right algorithm depends on the query pattern:

one or two queries on a moderate graph: traversal-based code is fine
thousands of queries on a static graph: invest in preprocessing

Common Pitfalls

The biggest mistake is assuming the DAG answer is always a single node. That is true for trees, but not for general DAGs. Your API should usually return a collection, even if many examples happen to produce one element.

Another bug is mixing edge direction. Ancestors are nodes that can reach the target following the original edge direction. In code, that usually means traversing the reverse graph from the target upward.

Some implementations also confuse "lowest by distance" with "lowest by ancestry." A node is lower if it is not an ancestor of another common ancestor, not merely because it has a shorter path length in one traversal.

Finally, do not overengineer with matrix-based transitive closure unless the graph is small and dense. For most application code, DFS or BFS plus a clear definition is easier to maintain.

Summary

In a DAG, the lowest common ancestor is often a set, not a single node.
A practical solution is to collect ancestors of both targets, intersect them, and remove higher common ancestors.
Traversing the reverse graph is the natural way to compute ancestor sets.
Repeated reachability checks are acceptable for small or infrequent queries but may need preprocessing at scale.
Correctness depends on a precise definition of "lowest," not just on graph traversal mechanics.