Why merge by rank, not count?

Question

Why merge by rank, not count?

790 views Asked by Rafael Korbas At 09 January 2014 at 17:56

I was wondering, why in the union-find algorithm - you merge two trees by their height - attaching the smaller one to the taller one (in the simpler variant without path compression).

Would it be a worse approach if you merged them by their count of elements - attaching the tree with less elements to the tree with more elements?

Original Q&A

There are 2 answers

Henry On 09 January 2014 at 19:10

This is an interesting question. Without having done a deep analysis it seems to be better to keep the average distance of the nodes from their respective root at a minimum instead of the maximum distance. This would mean, adding the tree with less elements to the one with more elements is the better approach. By doing this, the average distance can increase by at most 1/2 while in the other case it can increase by up to 1.

**templatetypedef** · Accepted Answer · 2014-01-09T19:31:29+00:00

It's safe to merge either by height or by number of nodes.

When merging by height, a tree with n nodes in it will have height at most O(log n). To see this, note that at least one node is required to get a tree of height zero. From there, the only way to get a tree of height h + 1 is to merge together two trees of height h. This means that the number of nodes required to get height h, which we'll denote by N(h), is given by

N(0) ≥ 1

N(h + 1) ≥ 2N(h)

This recurrence solves to N(h) ≥ 2^h, which means that with n total nodes, we would get a maximum possible height of about log n. This upper bound is tight, since if you start with 2^h independent trees and repeatedly pairwise merge them, you'd produce a single tree of height h.

When merging by number of nodes, we can similarly show that a tree of n nodes will have height at most O(log n). We can use a similar argument. Let M(h) denote the minimum number of nodes necessary to create a tree of height h when merging by number of nodes. Clearly, M(0) = 1. So, what would have to happen to create a tree of height h + 1? This would require merging together two trees of height h together, and so we get that

M(0) = 1

M(h + 1) ≥ 2M(h)

This, as before, solves to M(h) ≥ 2^h, so not only is the upper bound also O(log n), it's the exact same upper bound we had before. Consequently, it's safe to union either by height or by number of nodes.

So why use height rather than number of nodes? One argument is to consider the number of bits of information required to store the height information. If each node stores the number of nodes below it, we need O(log n) bits per node to write out the size information. On the other hand, if each node only stores the height it's at, a number which is O(log n), we only need O(log log n) bits per node to write everything out, which is exponentially less space. Moreover, the optimization of using path compression and union-by-rank is specifically designed to use height information rather than size information, so without repeating the (frighteningly hard) runtime analysis we can't conclude whether we still get the same time bound.

Hope this helps!

TechQA.

Why merge by rank, not count?

There are 2 answers

Related Questions in ALGORITHM

Related Questions in DATA-STRUCTURES

Related Questions in UNION-FIND

Popular Questions

Popular Tags

Trending Questions