## Union-Find

The Union-Find data structure is well known in the image processing community because of its use in efficient connected component labeling algorithms. It is also an important part of Kruskal’s algorithm for the minimum spanning tree. It is used to keep track of equivalences: are these two objects equivalent/connected/joined? You can think of it as a forest of trees. The nodes in the trees are the objects. If two nodes are in the same tree, they are equivalent. It is called Union-Find because it is optimized for those two operations, Union (joining two trees) and Find (determining if two objects are in the same tree). Both operations are performed in (essentially) constant time (actually it is O(α(n)), where α is the inverse Ackermann function, which grows extremely slowly and is always less than 5 for any number you can write down).

Here I’ll describe the data structure and show how its use can significantly speed up certain types of operations.

### Implementation

The Union-Find data structure is basically an array *A*, where an object with ID (label, index) *i* is located at *A*[*i*]. The value at that location is the ID of the parent node. Thus, the tree structure only contains pointers from the children to the parent. A node that points at itself (*A*[*i*]==*i*) is the root of a tree. In C++:

class UnionFind { private: std::vector<int> A; };

Three operations are required on this tree:

- Create: add a new node.
- Find: find the root of a tree.
- Union: join two trees.

The Create operation simply extends the array *A* by one element and sets its value to its index. This creates a new tree, the new node is its own parent:

int UnionFind::Create() { int i = A.size(); A.push_back(i); return i; }

The Find operation needs to follow parent links until it reaches a node that points to itself. This is the root. The trick that makes this operation constant time is that, every time we look for a root, we set all nodes along the path to point directly to the root. This makes each tree completely flat. The next time the same node is queried, its result is obtained directly:

int Find(int i) const { if (A[i] == index) { return i; } else { int root = Find(A[i]); A[i] = root; return root; } }

Finally, the Union operation simply makes one root point to the other. It uses the Find operation on its two operands, so that any two nodes can be used to merge their trees:

int Union(int i1, int i2) { i1 = Find(i1); i2 = Find(i2); if (i1 == i2) { return i1; } else { int root = std::min(i1, i2); int leaf = std::max(i1, i2); A[leaf] = root; return root; } }

Here, I picked the smaller of the indices to be the root. I have seen implementations that pick one node at random, but this is computationally more expensive. Other implementations keep track of a “rank” for each tree, and merge the tree with the lower rank into the one with the higher rank. This keeps the trees flatter. However, because of the flattening in the Find operations, trees are pretty flat anyway. The most efficient approach would be to simply take `i1` as root.

Note that it is possible to extend the above and add some additional information to each root, such as the size of the connected component for connected component labeling. The Union then must add the sizes for the two roots, and assign that to the new root.

### Watershed region merging

The first time I implemented a Union-Find–based algorithm, I did not know what Union-Find was (I re-invented the wheel). And consequently I missed the important detail of flattening the trees (my wheel was square). This was the “fast” watershed algorithm in DIPlib and DIPimage. It is an algorithm that takes some shortcuts to speed up the watershed computation. It sorts all pixels, then processes them in this sorted order. Plateaus (regions of constant value in the image) are not handled correctly by this algorithm, but a built-in merging procedure solves that issue. Basically, when two different regions first meet, a decision is made to keep them separate and draw a watershed line, or to merge them. The Union-Find structure is used here to keep track of these merged regions.

Let’s time this watershed in DIPimage 2:

img = repmat(readim,[4,3]); t1 = timeit(@()watershed(img,1,0,0)) % merge only within plateaus t2 = timeit(@()watershed(img,1,20,0)) % merge more aggressively

This leads to `t1 = 0.4034` and `t2 = 1.5193`. As you can see, more merging means deeper trees, which means it takes more time to find the root. In DIPlib 3 I implemented a correct Union-Find data structure, and plugged it into the same watershed algorithm. The same test now gives `t1 = 0.0819` and `t2 = 0.0821`. That is, with little merging the algorithm is 5x faster, and no matter how much we merge, it takes the same amount of time. (Do note that some of that 5x speedup is caused by the way that DIPimage 2 handled input argument parsing, I did not eliminate that from these tests.)

### Connected component labeling

DIPlib 2 used a fairly efficient connected component labeling algorithm that works for images of any dimensionality, and can be adjusted to use any connectivity. It walks over the image until it finds a foreground pixel that hasn’t been processed yet, assigns a new label to it, then applies a flood filling algorithm to visit all pixels in the connected component and assign the same label to them. It then continues where it left off to search for the next unprocessed pixel. In DIPlib 3 I copied the fastest algorithm from YACCLAB, which is specific to 2D images and 8-connected neighborhood, and wrote a similar algorithm that works for images of any dimensionality and for any connectivity. This algorithm makes two passes over the image. First it labels each pixel it finds, and uses the Union-Find data structure to keep track of which labels are connected (and thus equivalent). In a second pass it paints each output pixel with the index of the corresponding root node. This second step also allows the output labels to be consecutive.

Let’s time these two different labeling algorithms:

img = repmat(readim,[4,3]) > 100; % a large 2D binary image t1 = timeit(@()label(img)) % 8-connected, DIPlib 3 uses algorithm from YACCLAB t2 = timeit(@()label(img,1)) % 4-connected, DIPlib 3 uses my own algorithm img = repmat(readim('chromo3d'),[5,4,10]) > 100; % a large 3D binary image t3 = timeit(@()label(img)) % DIPlib 3 uses my own algorithm

Under DIPimage 2, this code leads to `t1 = 0.0197`, `t2 = 0.0187` and `t3 = 1.1520`. Under DIPimage 3 I see `t1 = 0.0045`, `t2 = 0.0051` and `t3 = 0.5352`. So the difference between the fastest implementation in YACCLAB and my own, generic algorithm is fairly small, but there is a difference of a factor of about 2x with the non–Union-Find algorithm (plus again some overhead for parameter parsing in the DIPimage 2 toolbox). Do note that the old algorithm’s running time depends on the number and size of objects, whereas the Union-Find algorithm is fairly consistent in timing.

### Detecting local maxima

In DIPlib 2 I had implemented a local maxima algorithm that used a similar flood-fill method to handle plateaus that are local maxima. I recently implemented a new algorithm that uses Union-Find for these. The new algorithm again walks over the image once, assigns a label to pixels that seem like they are local maxima or part of a local maximum plateau. Other pixels belonging to such a plateau are assigned the same label. At times a pixel is processed that “ties” two differently labeled regions together, these two regions are merged. When a pixel on such a plateau makes it clear that the plateau is not a local maximum (it has a higher-valued neighbor), the region is merged with the “background” (zero) label, identifying it as not a local maximum. Like in the connected component labeling algorithm, a second pass paints all pixels with a consistent label, and resets all pixels that are not local maxima. This algorithm is significantly more efficient for images with many plateaus. For example:

img = repmat(readim,[4,3]); t1 = timeit(@()maxima(img)) img = round(gaussf(img,2)/16)*16; % image with many plateaus t2 = timeit(@()maxima(img))

With DIPimage 2 I get `t1 = 0.0397` and `t2 = 0.0657`. The second image needs nearly twice as much time because of the many plateaus. With DIPimage 3 this is `t1 = 0.0185` and `t2 = 0.0198`. That is, the two images run in nearly the same amount of time. The two-fold difference for the first image is in part due to more efficient pixel access. While porting (and rewriting!) algorithms for DIPlib 3 I paid a lot more attention to walking over the image in memory storage order, and avoiding conditional statements within tight loops.

Hi Cris,

Enjoyed your blog as always! Nice to read how you have been able to implement these algorithms in such an improved way!

Geert