Why does elasticsearch still use simple routing value using modulo?

1k views Asked by At

Just wondering why elasticsearch still use that simple routing value approach for deciding which shard the data must be stored to. Actually this approach is limiting us to change the number of shards in the future. If elasticsearch uses an approach like consistent hashing (or even better technique), it can give us a chance to change the shard number in the future. Anyone have explanation or idea about this?

1

There are 1 answers

0
ssgao On

As of Elasticsearch release 6.1.0, index splitting is possible. See release note: https://www.elastic.co/blog/elasticsearch-6-1-0-released.

The Split Index documentation actually explains why Elasticsearch doesn't use Consistent Hashing in more detail.

Consistent hashing only requires 1/N-th of the keys to be relocated when growing the number of shards from N to N+1. However Elasticsearch’s unit of storage, shards, are Lucene indices. Because of their search-oriented data structure, taking a significant portion of a Lucene index, be it only 5% of documents, deleting them and indexing them on another shard typically comes with a much higher cost than with a key-value store.