Consistent Hashing: what about rehashing?

Question

Consistent Hashing: what about rehashing?

1.5k views Asked by santiagobasulto At 23 February 2011 at 22:45

As you may know, consistent hashing is a great idea when dealing with DHT. The main idea is to not suffer too much when a new node is added or deleted.

From the original Paper:

When a machine is added to or removed from the set of caches, the expected fraction of objects that must be moved to a new cache is the minimum needed to maintain a balanced load across the caches.

The solution is great, but there is a phenomenon of bad distribution of the keys. To solve that, replicas of the original nodes are distributed randombly. That solution works quite well. Look at this chart if you want to be sure.

Ok, seems to work well. But, there is something i've been thinking that nobody mention.

What happens when one node is added (or removed)? Well, every key, "before" the node that is placed needs to be rehashed. That seems good, becouse those keys will not be "all" the keys. But, if we decide to place some replicas, say 20, then, 20 nodes will feel the pain of rehashing.

Less replicas means worse distribution, but more replicas means more pain when rehashing is needed.

What solution do you know would suit in this situation? Am I missing something?

Original Q&A

There are 2 answers

**Jérôme Verstrynge** · Answer 1 · 2011-04-16T18:23:14+00:00

Jérôme Verstrynge On 16 April 2011 at 18:23

It looks like you are trying to solve a distribution issue by increasing the number of replicas, when a 'better' hashing function would do the trick. Good hash functions do provide good distributions (see MD5, SHA, etc...).

**aaronsuperglue** · Answer 2 · 2015-01-15T05:33:44+00:00

Yeah, I think you're looking at it the wrong way. I'll use the terms "node" for a machine and "object" for a thing to be cached.

On average, almost every node will be affected by a new add being added. This is not bad; it spreads the load of rehashing across all available nodes.

The important part is that most objects are not affected by the rehashing. On average, only 1/nodes objects will need to be reassigned; and on average, each node will only need to cope with transferring-away 1/nodes^2 nodes, which really cuts down on the impact of this.

TechQA.

Consistent Hashing: what about rehashing?

There are 2 answers

Related Questions in DISTRIBUTED

Related Questions in HASH

Related Questions in DISTRIBUTED-CACHING

Related Questions in DHT

Related Questions in CONSISTENT-HASHING

Popular Questions

Popular Tags

Trending Questions