Disjoint-set data structure

A union-find data structure maintains a partition of a set. The abstract data type is formed by the set of three operations { Union, Find, Make- Set }. Union -Find structures are used for the management of decompositions into disjoint sets. It gets a lot of decomposition associated with a canonical element, which serves as the name of the set. Union brings together two such quantities, Find ( x ) determines the canonical element of those set containing x, and make- set produces a singleton set { x} with the canonical element x.

Applications

Union -Find structures are well suited to manage connected components of the graph. Kruskal 's algorithm requires, for example, such a data structure to be implemented efficiently.

Definition

A finite set is partitioned into disjoint classes:

For each class a representative is selected. The corresponding union-find structure supports the following operations efficiently:

Init (): Initializes the structure and forms for each a separate class with a representative.
Union (,): Combines the two classes that belong to the two representatives and, and appointed as the new representatives of the new class.
Find ( ): Determines the unique representative among its class.

Implementation

A trivial implementation stores from the affiliations between the elements and the representatives in an array. However, quantities of trees are used in practice for shorter durations. The representatives will be stored in the roots of the trees, the other elements of the relevant Class in the nodes below it.

Union (,): Depends on the root of the lower tree as a new child under the root of the tree higher (weighted association ). If now is a child of, and be reversed.
Find ( ): migrates from node of the path within the tree up to the root and returns as a result.

Heuristics

To speed up the operations and Find Union, there are the two heuristics Union -by -size and path compression.

Union -by - Size

During the operation Union ( r, s) the tree, that is smaller, hung under the larger tree. This prevents that individual subtrees can degenerate into lists as in the naive implementation (r is in each case the root of the new subtree ). The depth of a subtree T can thus not be greater than. With this heuristic, the worst-case running time of Find of reduced to. For an efficient implementation is carried with the number of elements in the subtree at each root.

Path compression

To speed up later Find ( x) lookups, one tries the paths from nodes visited to the associated root shortening.

Maximum shortening ( Wegkompression )

After running Find ( x) all nodes are set to the path of x to the root directly under the root. This method provides the greatest cost advantage for subsequent calls to Find for a node in the same subtree as compared to the two following. Although this needs to be considered, each node on the path between the root and x twice, for the duration, however, that does not matter.

Allocation method ( splitting)

During the passage is allowed each node point to its previous grandfather ( if any); so is split into two by a half of the length of current path.

Bisection method ( halving )

During the passage can be shown every other node on his recent grandfather.

These methods both have the same amortized cost as the top compression method ( write node under the root ). All compression methods accelerate future Find ( x ) operations.

Maturities

Union-find data structures enable the execution of the above operations with the following maturities ( ):

Trivial implementation with an array A (A [i ] = x: node i is the quantity x), worst-case: Find, Union:

Implementation with trees:

Without heuristics: Find, Union
With union -by -size, worst-case: Find, Union
With union -by -size, path compression, the worst-case: Find, Union
By Union -by -size, path compression, series of m, n- 1 Find Union operations:

It is the inverse of the Ackermann function. It grows very slowly and is for all "practical" input ( ) is smaller than 5 In 1977, Robert Tarjan has shown that for a sequence of m Find- and n- 1 Union operations each data structure in the pointer - machine model for a period of needed. Fredman and Saks showed in 1989 that this lower bound model is also valid in the more general cell sample. It follows that the data structure with union -by -size and path compression has an asymptotically optimal running time, and there can be no union-find data structure that both Find and Union amortized O ( 1 ) executes.

Kruskal's algorithm Run time (program lifecycle phase) Amortized analysis Java-Applet Digital Object Identifier

645963