Topological Sort

Graph & Pathfinding

Orders vertices in a directed acyclic graph so every edge goes from earlier to later.

Topological Sort produces a linear ordering of vertices in a Directed Acyclic Graph (DAG) such that for every directed edge (u, v), vertex u appears before vertex v in the ordering. This fundamental graph algorithm was first formally described in the context of partial order theory but became practically important with the rise of computing, where dependency resolution is ubiquitous. The BFS-based approach was published by Arthur Kahn in 1962, and it remains one of the most intuitive and widely implemented topological sorting techniques. The algorithm is the backbone of scheduling tasks with dependencies, and any system that must respect "do A before B" constraints relies on topological ordering.

How It Works

Kahn's Algorithm (BFS-based):

  1. Compute in-degrees: For each vertex in the DAG, count the number of incoming edges. Vertices with in-degree 0 have no prerequisites and can be processed immediately.
  2. Initialize queue: Add all vertices with in-degree 0 to a queue. These are the starting points, the tasks that have no dependencies.
  3. Process vertices: Remove a vertex from the queue, append it to the result ordering, and examine all its outgoing edges. For each neighbor, decrease its in-degree by 1. This simulates "completing" the current task and removing its constraint on downstream tasks.
  4. Enqueue freed vertices: If any neighbor's in-degree drops to 0, it means all its prerequisites have been processed. Add it to the queue so it can be processed next.
  5. Repeat: Continue until the queue is empty. If the result contains all V vertices, the ordering is valid. If some vertices remain unprocessed, the graph contains a cycle and no topological ordering exists.

DFS-based approach:

  1. Run DFS: Initiate a depth-first search from each unvisited vertex in the graph.
  2. Post-order recording: After all descendants of a vertex have been fully explored (all recursive DFS calls have returned), push the vertex onto a stack. This ensures that a vertex is recorded only after everything that depends on it has already been recorded.
  3. Reverse for result: The stack, when popped from top to bottom, gives a valid topological ordering. Alternatively, you can append vertices to a list and reverse it at the end.

Both approaches produce valid topological orderings, but they may produce different orderings since multiple valid topological sorts can exist for a given DAG. Kahn's algorithm naturally processes vertices in a "breadth-first" order and can be adapted to produce a lexicographically smallest ordering by using a min-heap instead of a simple queue. The DFS-based approach is often more concise to implement and integrates naturally with other DFS-based graph analyses like cycle detection and strongly connected component identification.

Cycle Detection

A critical property of topological sort is that it can only succeed on acyclic graphs. In Kahn's algorithm, cycle detection is built in: if the final result contains fewer vertices than the total vertex count, the remaining vertices are part of one or more cycles (they could never reach in-degree 0 because they depend on each other circularly). In the DFS approach, cycles can be detected by maintaining a "currently in recursion stack" set and checking whether a visited neighbor is still on the stack. This makes topological sort a practical tool for validating dependency graphs before executing tasks.

Complexity

MetricValue
TimeO(V + E)
SpaceO(V)

Both Kahn's algorithm and the DFS approach visit each vertex once and process each edge once, yielding linear time complexity. The space is O(V) for the in-degree array (Kahn's) or the recursion stack and visited set (DFS), plus O(V) for the output ordering.

When to Use

Topological sort is used wherever tasks have prerequisites or dependency chains. Build systems like Make and Bazel use it to determine the correct compilation order so that every source file is compiled after its dependencies. Package managers such as apt, npm, and pip use topological ordering to install packages in the right sequence, ensuring that dependencies are available before the packages that need them. University course scheduling systems model courses as vertices and prerequisite relationships as edges, then compute a topological order to produce a valid semester-by-semester plan. Compilers use topological sort for instruction scheduling and register allocation. Spreadsheet applications evaluate cell formulas in topological order so that each cell is computed after all cells it references. Data pipeline orchestration tools like Apache Airflow model task DAGs and execute them in topological order. In every case, the pattern is the same: model constraints as directed edges and compute the ordering that respects all of them.