Consistent Hashing: what about rehashing?

1.5k views Asked by At

As you may know, consistent hashing is a great idea when dealing with DHT. The main idea is to not suffer too much when a new node is added or deleted.

From the original Paper:

When a machine is added to or removed from the set of caches, the expected fraction of objects that must be moved to a new cache is the minimum needed to maintain a balanced load across the caches.

The solution is great, but there is a phenomenon of bad distribution of the keys. To solve that, replicas of the original nodes are distributed randombly. That solution works quite well. Look at this chart if you want to be sure.

Ok, seems to work well. But, there is something i've been thinking that nobody mention.

What happens when one node is added (or removed)? Well, every key, "before" the node that is placed needs to be rehashed. That seems good, becouse those keys will not be "all" the keys. But, if we decide to place some replicas, say 20, then, 20 nodes will feel the pain of rehashing.

Less replicas means worse distribution, but more replicas means more pain when rehashing is needed.

What solution do you know would suit in this situation? Am I missing something?

2

There are 2 answers

0
Jérôme Verstrynge On

It looks like you are trying to solve a distribution issue by increasing the number of replicas, when a 'better' hashing function would do the trick. Good hash functions do provide good distributions (see MD5, SHA, etc...).

0
aaronsuperglue On

Yeah, I think you're looking at it the wrong way. I'll use the terms "node" for a machine and "object" for a thing to be cached.

On average, almost every node will be affected by a new add being added. This is not bad; it spreads the load of rehashing across all available nodes.

The important part is that most objects are not affected by the rehashing. On average, only 1/nodes objects will need to be reassigned; and on average, each node will only need to cope with transferring-away 1/nodes^2 nodes, which really cuts down on the impact of this.