Algorithm to find lowest common ancestor in directed acyclic graph?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Finding a lowest common ancestor in a tree is straightforward because every node has exactly one parent. In a directed acyclic graph, that assumption disappears, so the problem needs a more careful definition before you even choose an algorithm.
For a DAG, two nodes can have multiple common ancestors, and there may be several equally low answers. In practice, the usual goal is to return all common ancestors that are not ancestors of any other common ancestor.
Define "Lowest" Before Writing Code
Let u and v be two nodes in a DAG. A node x is a common ancestor if there is a directed path from x to u and also from x to v.
In a tree, the lowest common ancestor is unique. In a DAG, uniqueness is not guaranteed. Imagine this shape:
- '
Apoints toBandC' - '
Bpoints toD' - '
Cpoints toE' - '
DandEboth point toF'
Depending on which two target nodes you choose, you may end up with multiple incomparable common ancestors. That is why many DAG algorithms return a set rather than a single node.
The most useful definition is:
- Find all common ancestors of
uandv. - Remove any ancestor that can reach another common ancestor.
- The remaining nodes are the lowest common ancestors.
A Practical Algorithm
For a one-off query, a clean approach is:
- Build the reverse graph so you can walk from a node to its parents.
- Collect all ancestors of
u. - Collect all ancestors of
v. - Intersect the two ancestor sets.
- From that intersection, keep only the nodes that do not reach any other node in the same intersection.
The first four steps are easy. The last step is where the DAG-specific reasoning happens. If common ancestor x can reach common ancestor y, then x is higher, not lower, so x should be removed.
Python Example
The example below represents edges as parent-to-child links and returns the set of lowest common ancestors for a pair of nodes:
Output:
In this graph, both A and B are also ancestors of F and G, but E is lower because it sits closer to both targets.
Improving Query Performance
The implementation above is easy to understand, but it recomputes reachability during the filtering step. That is acceptable for small graphs or occasional queries, but not for a workload with many LCA lookups.
For repeated queries, common optimizations are:
- precompute ancestor bitsets in topological order
- precompute transitive closure when the graph is small enough
- memoize reachability results
- reduce the graph to a relevant subgraph before filtering
If the DAG is fixed and queries are frequent, preprocessing usually pays for itself. If the graph changes often, the simpler per-query traversal may still be the better engineering choice.
Complexity Discussion
Let V be the number of vertices and E the number of edges.
Collecting ancestors of each target takes O(V + E) in the worst case. The expensive part in the simple implementation is the pairwise reachability test inside the common ancestor set. If that set is large, the total cost can grow noticeably.
That does not mean the approach is wrong. It means the right algorithm depends on the query pattern:
- one or two queries on a moderate graph: traversal-based code is fine
- thousands of queries on a static graph: invest in preprocessing
Common Pitfalls
The biggest mistake is assuming the DAG answer is always a single node. That is true for trees, but not for general DAGs. Your API should usually return a collection, even if many examples happen to produce one element.
Another bug is mixing edge direction. Ancestors are nodes that can reach the target following the original edge direction. In code, that usually means traversing the reverse graph from the target upward.
Some implementations also confuse "lowest by distance" with "lowest by ancestry." A node is lower if it is not an ancestor of another common ancestor, not merely because it has a shorter path length in one traversal.
Finally, do not overengineer with matrix-based transitive closure unless the graph is small and dense. For most application code, DFS or BFS plus a clear definition is easier to maintain.
Summary
- In a DAG, the lowest common ancestor is often a set, not a single node.
- A practical solution is to collect ancestors of both targets, intersect them, and remove higher common ancestors.
- Traversing the reverse graph is the natural way to compute ancestor sets.
- Repeated reachability checks are acceptable for small or infrequent queries but may need preprocessing at scale.
- Correctness depends on a precise definition of "lowest," not just on graph traversal mechanics.

