Topological Sort and Advanced Graphs

Topics Covered

Topological Sort Concept

What Topological Sort Is

Why It Only Works on DAGs

Real-World Applications

Connection to Partial Orders

Why This Matters in Interviews

Kahn's Algorithm (BFS)

The Algorithm Step by Step

Concrete Example

Why BFS-Based

Complete Implementation

Time and Space Complexity

Course Schedule II

Practice

DFS-Based Topological Sort

Why Post-Order Reversal Works

The Algorithm

Concrete Example

Complete Implementation

Comparing DFS and Kahn's

Cycle Detection in Directed Graphs

Why Two Colors Are Not Enough for Directed Graphs

The Three-Color (Three-State) Approach

Walkthrough

Complete Three-Color DFS Implementation

Cycle Detection via Kahn's Algorithm

Course Schedule (Cycle Detection Application)

Choosing Between the Two Approaches

Practice

Weighted Shortest Paths

Why BFS Is Not Enough

Dijkstra's Algorithm Step by Step

Concrete Example

Complete Implementation

Why Non-Negative Weights Are Required

Time and Space Complexity

Network Delay Time

Bellman-Ford: A Brief Mention

Practice

Graphs appear everywhere in real software, but a surprising number of interview problems boil down to one question: in what order should I process these nodes? Topological sort answers that question for directed acyclic graphs (DAGs), and understanding it deeply unlocks an entire family of problems that look different on the surface but share the same underlying structure.

What Topological Sort Is

A topological sort of a directed acyclic graph is a linear ordering of its vertices such that for every directed edge u to v, vertex u appears before vertex v in the ordering. Think of it as flattening a DAG into a line where every arrow points forward, never backward.

Consider a university course catalog where each course has prerequisites. If Data Structures requires Intro to Programming, and Algorithms requires Data Structures, then any valid semester plan must list Intro to Programming before Data Structures, and Data Structures before Algorithms. A topological sort produces exactly this kind of valid ordering.

 
Intro to Programming -> Data Structures -> Algorithms
                                         -> Operating Systems

Both [Intro, Data Structures, Algorithms, OS] and [Intro, Data Structures, OS, Algorithms] are valid topological orderings. Algorithms and OS have no dependency between them, so either can come first.

Why It Only Works on DAGs

A topological ordering is impossible when the graph contains a cycle. If A depends on B, B depends on C, and C depends on A, there is no way to place all three in a line where every edge points forward. At least one edge must point backward, violating the topological property. This is not just a theoretical concern - in real systems, circular dependencies cause deadlocks in build systems, infinite loops in package managers, and unresolvable ordering in task schedulers.

The contrapositive is equally important: if you can produce a valid topological ordering, the graph has no cycles. This makes topological sort a cycle detection tool, not just an ordering tool.

Real-World Applications

Build systems (Make, Bazel, Gradle): Source files and compilation targets form a dependency graph. The build system topologically sorts the targets to determine which must be compiled first. A cyclic dependency means the build cannot proceed.

Package managers (pip, npm, apt): When you run pip install, the resolver builds a dependency graph of all packages and their version constraints. It topologically sorts this graph to determine the installation order - a package is installed only after all its dependencies are in place.

Task scheduling: In any pipeline where tasks have dependencies - data pipelines, CI/CD stages, manufacturing workflows - topological sort determines a valid execution order. Tasks with no unmet dependencies can execute in parallel.

Spreadsheet evaluation: Cells referencing other cells form a dependency graph. The spreadsheet engine topologically sorts cells to determine evaluation order. A circular reference (cycle) triggers an error.

Connection to Partial Orders

Topological sort is the bridge between a partial order and a total order. A DAG defines a partial order: some pairs of nodes have a defined before/after relationship (connected by a directed path), while others are incomparable (no path between them). A topological sort extends this partial order into a total order - every pair of nodes gets a position, and all the original ordering constraints are preserved.

This is why multiple valid topological orderings exist. The partial order constrains some pairs but leaves others free. Each topological sort resolves the ambiguity differently, but all respect the original constraints.

Interview Tip

In interviews, topological sort problems rarely announce themselves. They hide behind terms like 'ordering', 'scheduling', 'prerequisites', 'dependencies', and 'build order'. Whenever a problem gives you items with pairwise constraints of the form 'X must come before Y', you are looking at a topological sort problem on a DAG.

Why This Matters in Interviews

The Course Schedule family of problems on LeetCode is a direct application: given a list of courses and prerequisite pairs, determine if you can finish all courses (cycle detection), or produce a valid ordering (topological sort). But the pattern extends far beyond courses. Alien Dictionary asks you to derive character ordering from a sorted word list - the characters form a DAG based on the sorting constraints. Sequence Reconstruction asks whether a topological ordering is unique. Parallel Courses asks for the minimum number of semesters, which is the longest path in the DAG.

Every one of these problems uses the same core algorithms: Kahn's BFS-based approach or DFS-based post-order reversal. Master both, and you have a versatile toolkit for an entire problem category.

Kahn's algorithm is the BFS-based approach to topological sort. It builds the ordering by repeatedly removing nodes with no remaining dependencies - nodes whose in-degree is zero. The intuition is straightforward: if a node has no incoming edges, all its prerequisites are already satisfied, so it can safely go next in the ordering.

The Algorithm Step by Step

  1. Compute in-degrees. For every node in the graph, count the number of incoming edges. Store these counts in an array or dictionary.
  2. Initialize the queue. Add every node with in-degree 0 to a queue. These are the "sources" - nodes with no prerequisites.
  3. Process nodes. Dequeue a node, add it to the result list. For each neighbor of this node (each node it has an edge to), decrement that neighbor's in-degree by 1.
  4. Enqueue new sources. If any neighbor's in-degree drops to 0, add it to the queue. It is now a source - all its dependencies have been processed.
  5. Repeat until the queue is empty.
  6. Check for cycles. If the result list contains fewer nodes than the graph has, some nodes were never enqueued - they are trapped in a cycle where their in-degree never reached 0.

Concrete Example

Consider this graph with 6 nodes (0 through 5):

 
1Edges: 5->2, 5->0, 4->0, 4->1, 2->3, 3->1
2
3Adjacency list:
4  5 -> [2, 0]
5  4 -> [0, 1]
6  2 -> [3]
7  3 -> [1]
8  0 -> []
9  1 -> []
10
11In-degrees:  0:2  1:2  2:1  3:1  4:0  5:0

Step 1: Nodes 4 and 5 have in-degree 0. Enqueue both. Queue: [4, 5].

Step 2: Dequeue 4. Add to result. Decrement neighbors 0 and 1. In-degrees become 0:1, 1:1. Neither reaches 0, so nothing enqueued. Result: [4]. Queue: [5].

Step 3: Dequeue 5. Add to result. Decrement neighbors 2 and 0. In-degrees become 2:0, 0:0. Both reach 0, enqueue both. Result: [4, 5]. Queue: [2, 0].

Step 4: Dequeue 2. Add to result. Decrement neighbor 3. In-degree of 3 becomes 0, enqueue 3. Result: [4, 5, 2]. Queue: [0, 3].

Step 5: Dequeue 0. Add to result. No neighbors to process. Result: [4, 5, 2, 0]. Queue: [3].

Step 6: Dequeue 3. Add to result. Decrement neighbor 1. In-degree of 1 becomes 0, enqueue 1. Result: [4, 5, 2, 0, 3]. Queue: [1].

Step 7: Dequeue 1. Add to result. Result: [4, 5, 2, 0, 3, 1]. Queue empty.

Result size (6) equals node count (6), so no cycle exists. [4, 5, 2, 0, 3, 1] is a valid topological ordering.

Kahn's BFS topological sort
542031in:0in:0in:1in:2in:2in:2Order: []
Initialize in-degrees, enqueue nodes with 0 in-degree, process in waves. Each wave removes nodes and decrements neighbors' in-degrees.

Why BFS-Based

Kahn's algorithm processes nodes in "waves." The first wave contains all nodes with no dependencies. The second wave contains nodes whose only dependencies were in the first wave. The third wave contains nodes whose dependencies span the first two waves, and so on. Each wave represents a "level" of the dependency hierarchy.

This wave structure has a practical benefit: nodes within the same wave are independent and can be processed in parallel. The number of waves equals the length of the longest dependency chain in the DAG. This is directly useful for problems like Parallel Courses, where you need to find the minimum number of semesters to complete all courses.

Complete Implementation

Time and Space Complexity

Time: O(V + E). Computing in-degrees visits every edge once: O(E). Each node is enqueued and dequeued exactly once: O(V). For each dequeued node, we visit its outgoing edges: O(E) total across all nodes. Combined: O(V + E).

Space: O(V + E). The adjacency list stores all edges: O(V + E). The in-degree array uses O(V). The queue holds at most V nodes. The result list holds V nodes. Total: O(V + E).

Course Schedule II

The Course Schedule II problem asks: given numCourses and a list of prerequisite pairs [course, prerequisite], return a valid order to take all courses, or an empty array if impossible. This is a direct application of Kahn's algorithm.

The only subtlety is the direction of the edge. The prerequisite pair [course, prereq] means prereq must come before course, so the edge goes from prereq to course. Getting this direction wrong is the most common bug in topological sort interview solutions.

Step through topological ordering - watch in-degrees drop to zero as prerequisites are processed:

Loading animator...
Common Pitfall

When converting prerequisite pairs to directed edges, double-check the direction. The pair [a, b] could mean 'a requires b' (edge from b to a) or 'a must come before b' (edge from a to b), depending on the problem statement. Read the problem carefully - reversing the edge direction silently produces an incorrect ordering that may still pass some test cases.

Practice

Loading problem...

There is a second way to compute a topological ordering that uses depth-first search instead of BFS. The idea is elegant: run DFS from every unvisited node, and add each node to the result only AFTER all its descendants have been fully explored. Then reverse the result. This post-order-then-reverse approach produces a valid topological ordering, and understanding why it works deepens your grasp of both DFS and topological structure.

Why Post-Order Reversal Works

Consider a directed edge from u to v. During DFS, there are two possibilities:

  1. v is visited before u finishes. The DFS starting from u eventually reaches v (directly or through other nodes). Since DFS explores v fully before returning to u, v's post-order timestamp is earlier than u's. In the reversed result, u comes before v.
  2. v is already fully visited when u explores it. v was processed in a previous DFS call and already has an earlier post-order timestamp. In the reversed result, u (later timestamp, reversed to earlier position) comes before v - wait, this is wrong. Actually, since v finished before u's DFS even started exploring this edge, v's post-order number is smaller. After reversal, u (larger post-order, now earlier in reversed list) comes before v. The constraint is satisfied.

In both cases, u appears before v in the reversed post-order list, which is exactly what topological ordering requires for the edge u to v.

DFS topological sort
542031Post-order:[]Reversed:
Run DFS and record each node in post-order (after all descendants finish). Reversing the post-order list yields a valid topological ordering.

The Algorithm

  1. Maintain a visited set to avoid reprocessing nodes.
  2. For each unvisited node, start a DFS.
  3. In the DFS, mark the node as visited, recursively visit all unvisited neighbors, then append the node to a result list (post-order).
  4. After all DFS calls complete, reverse the result list.

The "visit all neighbors first, then record yourself" step is the key. A node is recorded only after everything reachable from it has been recorded. Reversing this means the node appears before everything reachable from it - exactly the topological property.

Concrete Example

Using the same graph as before: nodes 0-5, edges 5->2, 5->0, 4->0, 4->1, 2->3, 3->1.

Start DFS from node 0 (unvisited). Node 0 has no unvisited neighbors. Post-order: append 0. Post-order list: [0].

Node 1 is unvisited. No unvisited neighbors. Append 1. Post-order list: [0, 1].

Node 2 is unvisited. Visit neighbor 3. Node 3's neighbor 1 is already visited. Append 3. Back to 2, append 2. Post-order list: [0, 1, 3, 2].

Node 3 is already visited. Skip.

Node 4 is unvisited. Visit neighbor 0 (visited, skip). Visit neighbor 1 (visited, skip). Append 4. Post-order list: [0, 1, 3, 2, 4].

Node 5 is unvisited. Visit neighbor 2 (visited, skip). Visit neighbor 0 (visited, skip). Append 5. Post-order list: [0, 1, 3, 2, 4, 5].

Reverse: [5, 4, 2, 3, 1, 0]. This is a valid topological ordering. Every edge points forward: 5->2 (position 0 to 2), 5->0 (0 to 5), 4->0 (1 to 5), 4->1 (1 to 4), 2->3 (2 to 3), 3->1 (3 to 4).

Complete Implementation

Comparing DFS and Kahn's

Both approaches produce valid topological orderings, but they differ in several practical ways.

AspectKahn's (BFS)DFS Post-Order
Traversal styleLevel-by-level (waves)Depth-first (deep before wide)
Cycle detectionResult size < VBack edge detected via in_progress set
Parallelism infoNatural - each wave is a parallel batchNot directly available
ImplementationQueue + in-degree arrayRecursion + visited/in_progress sets
Iterative conversionStraightforward (already iterative)Requires explicit stack to avoid recursion limits
Lexicographic orderUse min-heap instead of queueMore complex to achieve

Kahn's algorithm is typically preferred when you need level information (minimum semesters, parallel batches) or lexicographic ordering. The DFS approach is preferred when you already have DFS infrastructure in your solution (e.g., you are also computing strongly connected components or doing other DFS-based analysis) or when the graph is represented in a way that makes neighbor iteration natural but in-degree computation expensive.

For interviews, know both. Some problems are more naturally solved with one approach, and being able to switch demonstrates depth of understanding.

Cycle detection in directed graphs is closely related to topological sort - a topological ordering exists if and only if the graph has no cycles. But cycle detection deserves its own treatment because the techniques differ in subtle but important ways from undirected cycle detection, and interview problems often ask specifically for cycle detection rather than a full topological ordering.

Why Two Colors Are Not Enough for Directed Graphs

In undirected graphs, cycle detection is simple: if DFS encounters a visited node that is not the immediate parent, there is a cycle. Two states (visited, unvisited) suffice because every edge to a visited non-parent node creates a cycle.

Directed graphs break this reasoning. Consider three nodes: A points to C, B points to C, and A points to B. Start DFS from A. Visit A, then B (via A->B), then C (via B->C). Mark C as visited, backtrack to B, mark B as visited, backtrack to A. Now visit A's other neighbor C - but C is already visited. With two-color logic, you would falsely report a cycle. But there is no cycle: A->B->C and A->C are just two paths converging on C.

The problem is that a two-color scheme cannot distinguish between a back edge (pointing to an ancestor in the current DFS path, indicating a cycle) and a cross edge (pointing to a node fully processed in a different DFS subtree, which is harmless). Directed graphs have both types, and only back edges indicate cycles.

The Three-Color (Three-State) Approach

The solution is to track three states for each node:

  • WHITE (unvisited): The node has not been discovered yet.
  • GRAY (in progress): The node is currently being explored - it is on the current DFS path from the root to the current node.
  • BLACK (completed): The node and all its descendants have been fully explored.

The critical rule: encountering a GRAY node during DFS means you have found a back edge, which means a cycle exists. The current node has an edge to an ancestor on the same DFS path, creating a cycle from that ancestor back to itself.

Encountering a BLACK node is safe - it was fully processed in a previous subtree and has no bearing on the current DFS path.

Cycle detection via back edge
ABCDWhite = unvisitedGray = in progressBlack = done
DFS uses three colors: white (unvisited), gray (in progress), black (done). Finding an edge to a gray node means a back edge exists, indicating a cycle.

Walkthrough

Consider this graph: A->B, B->C, C->A (a cycle), and D->B (a cross edge into the cycle).

 
1Start DFS from A (color A GRAY):
2  Visit B (color B GRAY):
3    Visit C (color C GRAY):
4      C has edge to A. A is GRAY (on current path).
5      BACK EDGE DETECTED - cycle exists: A -> B -> C -> A

Now consider the same graph without the C->A edge: A->B, B->C, D->B.

 
1Start DFS from A (color A GRAY):
2  Visit B (color B GRAY):
3    Visit C (color C GRAY):
4      C has no neighbors. Color C BLACK.
5    Color B BLACK.
6  Color A BLACK.
7
8Start DFS from D (color D GRAY):
9  D has edge to B. B is BLACK (fully processed).
10  This is a cross edge, not a back edge. No cycle.
11  Color D BLACK.
12
13No GRAY node was ever re-encountered. No cycle.

Complete Three-Color DFS Implementation

Cycle Detection via Kahn's Algorithm

The alternative approach is simpler to implement: run Kahn's algorithm and check if the result includes all nodes. If some nodes remain unprocessed (result size < V), those nodes are part of or connected to a cycle. Nodes in a cycle never reach in-degree 0 because each has at least one predecessor within the cycle that was also never processed.

Watch Kahn's algorithm detect a cycle when nodes remain unprocessed:

Loading animator...

Course Schedule (Cycle Detection Application)

The Course Schedule problem asks: given numCourses and prerequisite pairs, can you finish all courses? This is pure cycle detection via Kahn's algorithm. If the result contains fewer courses than the total, a cycle exists and you cannot finish all courses.

Note the edge reversal: the prerequisite pair [course, prereq] means prereq must come before course, so the directed edge goes from prereq to course. The cycle detection does not care about the direction of each individual edge - it only checks whether the overall graph has a cycle - but getting the construction right ensures you are checking the correct graph.

Choosing Between the Two Approaches

Use Kahn's when you only need to know whether a cycle exists and you want the simplest implementation. It naturally produces a count of processable nodes, and the cycle check is a one-line comparison.

Use three-color DFS when you need to identify which nodes are in the cycle, find the actual cycle path, or when the problem requires DFS for other reasons (e.g., you also need to compute connected components or classify edges). The three-color approach gives you more information about the graph's structure.

Practice

Loading problem...

Topological sort handles ordering in unweighted DAGs, but many real-world graph problems involve weighted edges. How long does a network packet take to reach every server? What is the cheapest flight route? These questions require shortest-path algorithms, and Dijkstra's algorithm is the most important one for interviews.

Why BFS Is Not Enough

In unweighted graphs, BFS finds shortest paths because every edge has the same cost - the shortest path is the one with the fewest edges. But when edges have different weights, the path with fewer edges might be more expensive than a longer path with lighter edges.

 
A --1--> B --1--> D     (total cost: 2)
A --5--> D              (total cost: 5)

Here BFS would find A->D in one step, but the shortest-weight path is A->B->D with cost 2. BFS minimizes hops, not total weight. Dijkstra's algorithm solves this by always expanding the node with the smallest known distance, ensuring that when a node is finalized, its shortest distance is correct.

Dijkstra's Algorithm Step by Step

  1. Initialize distances. Set the source node's distance to 0 and all other nodes to infinity. Use a min-heap (priority queue) initialized with (0, source).
  2. Extract minimum. Pop the node with the smallest distance from the heap. If this node has already been finalized (we have already processed it with a shorter distance), skip it.
  3. Relax edges. For each neighbor of the current node, compute the candidate distance: current node's distance + edge weight. If this candidate is less than the neighbor's known distance, update the neighbor's distance and push (new_distance, neighbor) onto the heap.
  4. Repeat until the heap is empty or all reachable nodes are finalized.

The term "relax" comes from the idea that you are loosening an upper bound. Initially every distance is infinity (the loosest possible bound), and each relaxation tightens it toward the true shortest distance.

Concrete Example

Consider this graph with 5 nodes:

 
1Edges (directed, with weights):
20 -> 1 (weight 4)
30 -> 2 (weight 1)
42 -> 1 (weight 2)
51 -> 3 (weight 1)
62 -> 3 (weight 5)
73 -> 4 (weight 3)

Starting from node 0:

Initial state: dist = [0, inf, inf, inf, inf]. Heap: [(0, 0)].

Step 1: Pop (0, 0). Process node 0.

  • Relax 0->1: dist[1] = min(inf, 0+4) = 4. Push (4, 1).
  • Relax 0->2: dist[2] = min(inf, 0+1) = 1. Push (1, 2).
  • Heap: [(1, 2), (4, 1)].

Step 2: Pop (1, 2). Process node 2.

  • Relax 2->1: dist[1] = min(4, 1+2) = 3. Push (3, 1).
  • Relax 2->3: dist[3] = min(inf, 1+5) = 6. Push (6, 3).
  • Heap: [(3, 1), (4, 1), (6, 3)].

Step 3: Pop (3, 1). Process node 1.

  • Relax 1->3: dist[3] = min(6, 3+1) = 4. Push (4, 3).
  • Heap: [(4, 1), (4, 3), (6, 3)].

Step 4: Pop (4, 1). Node 1 already finalized with distance 3. Skip.

Step 5: Pop (4, 3). Process node 3.

  • Relax 3->4: dist[4] = min(inf, 4+3) = 7. Push (7, 4).
  • Heap: [(6, 3), (7, 4)].

Step 6: Pop (6, 3). Node 3 already finalized with distance 4. Skip.

Step 7: Pop (7, 4). Process node 4. No neighbors.

Final distances from node 0: [0, 3, 1, 4, 7].

The shortest path to node 1 is 0->2->1 (cost 3), not the direct edge 0->1 (cost 4). Dijkstra found this because it processed node 2 (distance 1) before node 1 (distance 4), and the relaxation through node 2 discovered the cheaper route.

Dijkstra shortest path
411225301234dist: 0dist: infdist: infdist: infdist: infPQ:[(0, 0)]
Extract the minimum-distance node from the priority queue, relax its neighbors. Finalized nodes (green) have their shortest distance locked in.

Complete Implementation

Why Non-Negative Weights Are Required

Dijkstra's algorithm relies on a greedy invariant: once a node is popped from the heap and finalized, its shortest distance is correct and will never improve. This works because all edge weights are non-negative - any future path to this node would go through other nodes with equal or greater distance, adding non-negative weights, so it cannot be shorter.

With negative weights, this invariant breaks. A node finalized with distance 5 might later be reachable through a path with a -10 weight edge, giving distance -5. But Dijkstra has already committed to 5 and will not revisit the node. The result is incorrect shortest distances.

Interview Tip

If you encounter a shortest-path problem with negative edge weights, Dijkstra will not work. Use the Bellman-Ford algorithm instead, which relaxes all edges V-1 times and correctly handles negative weights (though not negative cycles). Bellman-Ford runs in O(V * E) time, slower than Dijkstra, which is the price of handling negative weights.

Time and Space Complexity

Time: O((V + E) log V) with a binary heap. Each node is extracted from the heap at most once: O(V log V). Each edge triggers at most one heap insertion: O(E log V). Combined: O((V + E) log V). With a Fibonacci heap, this improves to O(V log V + E), but Fibonacci heaps are rarely used in practice due to high constant factors.

Space: O(V + E). The adjacency list stores all edges. The distance array and finalized set each use O(V). The heap can hold up to O(E) entries (one per edge relaxation), though in practice it is much smaller.

Network Delay Time

The Network Delay Time problem asks: given a network of n nodes and weighted directed edges representing signal travel times, if a signal starts at node k, how long until all nodes receive the signal? If any node is unreachable, return -1.

This is a direct Dijkstra application. The answer is the maximum shortest distance to any node - the last node to receive the signal determines the total delay.

Watch Dijkstra propagate shortest distances through the network:

Loading animator...

Bellman-Ford: A Brief Mention

When edge weights can be negative, Bellman-Ford is the standard alternative. It works by relaxing every edge V-1 times. After V-1 iterations, the shortest distances are correct (assuming no negative cycles). A V-th iteration that still finds improvements indicates a negative cycle - a cycle whose total weight is negative, meaning shortest distances are undefined (you can loop forever getting shorter).

Bellman-Ford relaxes every edge V-1 times. After V-1 iterations, shortest distances are correct (assuming no negative cycles). A V-th iteration that still improves distances indicates a negative cycle. It runs in O(V * E) time - significantly slower than Dijkstra for dense graphs but necessary when negative weights exist.

Practice

Loading problem...