algoviz - Minimum Spanning Tree — Kruskal's Graph Algorithm

Minimum Spanning Tree

Graph & Pathfinding

Finds the minimum cost tree connecting all vertices using Kruskal's algorithm.

A Minimum Spanning Tree (MST) connects all vertices in a weighted, undirected graph with the minimum total edge weight, using exactly V-1 edges (where V is the number of vertices). The concept of spanning trees dates back to the work of Otakar Boruvka in 1926, who developed the first known MST algorithm to design an efficient electrical power network for Moravia. This visualization shows Kruskal's algorithm, published by Joseph Kruskal in 1956, which builds the MST using a greedy edge-selection strategy powered by the Union-Find data structure.

How Kruskal's Algorithm Works

Sort edges: Sort all E edges by weight in ascending order. This is the most expensive step and determines the overall time complexity.
Initialize: Create a Union-Find (disjoint set) data structure where each vertex starts as its own separate component. This means the forest initially consists of V isolated trees, each containing a single vertex.
Process edges: Iterate through the sorted edges in order of increasing weight. For each edge (u, v), use the Union-Find find operation to check if u and v belong to different components.
- If yes, add the edge to the MST and use the union operation to merge the two components into one. This edge is safe because it cannot create a cycle between two previously disconnected components.
- If no (both endpoints are already in the same component), skip the edge. Adding it would create a cycle, violating the tree property.
Complete: Stop when V-1 edges have been added to the MST. At this point, all vertices are connected in a single spanning tree with minimum total weight.

The Union-Find Data Structure

The efficiency of Kruskal's algorithm depends critically on the Union-Find (also called Disjoint Set Union or DSU) data structure. Two key optimizations make Union-Find nearly constant time per operation. Path compression flattens the tree structure during find operations by making every node point directly to the root. Union by rank always attaches the shorter tree under the root of the taller tree during merges. Together, these optimizations yield an amortized time of O(alpha(n)) per operation, where alpha is the inverse Ackermann function, which grows so slowly that it is effectively constant (at most 4 for any practical input size). This is one of the most elegant data structures in computer science and appears in many applications beyond MST algorithms.

Complexity

Metric	Value
Time	O(E log E)
Space	O(V + E)

The sorting step dominates the runtime at O(E log E). Since E is at most V squared, O(E log E) is equivalent to O(E log V). With path compression and union by rank, all Union-Find operations across the algorithm take nearly O(E) total time, which is absorbed by the sorting cost.

Comparison with Prim's Algorithm

The two classic MST algorithms are Kruskal's and Prim's, and each has different strengths. Kruskal's algorithm operates globally by processing edges in sorted order, making it naturally suited for sparse graphs where E is close to V. It also works well when the edge list is already available and easy to sort. Prim's algorithm, by contrast, grows the MST from a single starting vertex, always adding the cheapest edge that connects the current tree to a new vertex. Prim's uses a priority queue and runs in O(E log V) with a binary heap, making it more efficient for dense graphs where E approaches V squared. With a Fibonacci heap, Prim's achieves O(E + V log V), which is asymptotically faster for very dense graphs.

A key theoretical result is the Cut Property, which guarantees that both algorithms produce correct MSTs: for any cut of the graph that separates vertices into two sets, the minimum-weight edge crossing the cut must belong to the MST.

When to Use

MST algorithms are used in network design where the goal is minimizing total connection cost, such as laying out electrical wiring, telecommunications cables, water pipelines, or computer network infrastructure. In cluster analysis, removing the most expensive edges from an MST produces single-linkage clustering, a technique used in data mining and bioinformatics. MSTs also provide 2-approximations for the Traveling Salesman Problem on metric graphs, and they appear in image segmentation algorithms where pixels are vertices and edge weights represent color differences. Kruskal's algorithm is preferred when working with sparse graphs or when edges are naturally provided as a list, while Prim's is better for dense graphs represented as adjacency matrices.