ElasticSearch + Kibana - Unique count using pre-computed hashes

Question

ElasticSearch + Kibana - Unique count using pre-computed hashes

749 views Asked by Michael At 10 June 2015 at 20:35

update: Added

I want to perform unique count on my ElasticSearch cluster. The cluster contains about 50 millions of records.

I've tried the following methods:

First method

Mentioned in this section:

Pre-computing hashes is usually only useful on very large and/or high-cardinality fields as it saves CPU and memory.

Second method

Mentioned in this section:

Unless you configure Elasticsearch to use doc_values as the field data format, the use of aggregations and facets is very demanding on heap space.

My property mapping

"my_prop": {
  "index": "not_analyzed",
  "fielddata": {
    "format": "doc_values"
  },
  "doc_values": true,
  "type": "string",
  "fields": {
    "hash": {
      "type": "murmur3"
    }
  }
}

The problem

When I use unique count on my_prop.hash in Kibana I receive the following error:

Data too large, data for [my_prop.hash] would be larger than limit

ElasticSearch has 2g heap size. The above also fails for a single index with 4 millions of records.

My questions

Am I missing something in my configurations?
Should I increase my machine? This does not seem to be the scalable solution.

ElasticSearch query

Was generated by Kibana: http://pastebin.com/hf1yNLhE

ElasticSearch Stack trace

http://pastebin.com/BFTYUsVg

Original Q&A

There are 1 answers

**Andrei Stefan** · Accepted Answer · 2015-06-12T09:45:27+00:00

That error says you don't have enough memory (more specifically, memory for fielddata) to store all the values from hash, so you need to take them out from the heap and put them on disk, meaning using doc_values.

Since you are already using doc_values for my_prop I suggest doing the same for my_prop.hash (and, no, the settings from the main field are not inherited by the sub-fields): "hash": { "type": "murmur3", "index" : "no", "doc_values" : true }.

TechQA.

ElasticSearch + Kibana - Unique count using pre-computed hashes

First method

Second method

My property mapping

The problem

My questions

ElasticSearch query

ElasticSearch Stack trace

There are 1 answers

Related Questions in ELASTICSEARCH

Related Questions in KIBANA-4

Popular Questions

Popular Tags

Trending Questions