Optimizing Bulk Indexing in elasticsearch

1.4k views Asked by At

We have an elastic search cluster of 3 nodes of the following configurations

 #Cpu Cores     Memory(GB)   Disk(GB)    IO Performance 
    36            244.0        48000        very high

The machines are in 3 different zones namely eu-west-1c,eu-west-1a,eu-west-1b.

Each elastic search instance is being allocated 30GB of heap space.

we are using the above cluster for running aggregations only. The cluster has replication factor of 1 and all the string fields are not analyzed , doc_values is true for all the fields.

We are pumping data into this cluster running 6 instances of logstash in parallel ( having a batch size of 1000)

When more instances of logstash are started one by one the nodes of the ElasticSearch cluster starts throwing out of memory error.

What could be the possible optimizations to speed up bulk indexing rate on the cluster?= Will presence of nodes of cluster in the same zone increase bulk indexing? Will adding more nodes in the cluster help ?

Couple of steps taken so far

Increase the bulk queue size from 50 to 1000
Increase refresh interval from 1 seconds to 2 minutes
Changed segments merge throttling to none ( https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing- performance.html)

We cannot set the replication factor to 0 due to inconsistency involved if one of the nodes goes down.

0

There are 0 answers