Disjoint Set Union (DSU)

Disjoint Set Union (DSU)

The Disjoint Set Union (DSU) or Union-Find data structure is used to keep track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It's particularly useful for problems involving connected components in graphs.

Core Operations

A DSU supports two primary operations:

1. Find:

Determine the representative (or root) of the set containing a particular element. This can be used for determining if two elements are in the same set (i.e., if find(a) == find(b)).

2. Union:

Merge the two sets containing two different elements into a single set.

Implementation & Optimizations

A DSU is typically implemented using an array, often called parent, where parent[i] stores the parent of element i. The representative of a set is an element that is its own parent (parent[i] == i).

A naive implementation can be slow. Two key optimizations make DSU nearly constant time on average for each operation.

1. Path Compression:

During a Find(i) operation, after finding the root of the set for element i, we make all nodes on the path from i to the root point directly to the root. This dramatically flattens the tree structure for future Find operations on these elements.

2. Union by Size / Rank:

This is a strategy for the Union operation. Instead of arbitrarily making one tree a child of the other, we attach the smaller tree to the root of the larger tree to keep the trees from getting too deep.

  • Union by Size: Track the number of elements in each set. Attach the smaller set to the root of the larger set. Generally preferred as it's slightly easier to implement.
  • Union by Rank: Track the "rank" (an upper bound on the height) of each tree. Attach the shorter tree to the root of the taller one.

When both optimizations are used, the time complexity for m operations on n elements is O(m * α(n)), where α(n) is the inverse Ackermann function, a very slowly growing function that is practically constant (≤ 5) for all realistic inputs.

Applications

  • Kruskal's algorithm for Minimum Spanning Tree: Used to check if adding an edge creates a cycle.
  • Finding connected components in a graph.
  • Network connectivity problems.