Script with data of sub terms aggregators in Elasticsearch

130 views Asked by At

I would like to calculate pourcentile between a count of terms aggregator in Elasticsearch.

My Query :

{
   "query" : {
      "match_all" : {}
   },
   "size" : 0,
   "aggs": {
      "eventName" : {
          "terms" : { "field" : "json.eventName" }
      }
   }
 }

Result aggregator :

"aggregations": {
    "eventName": {
        "doc_count_error_upper_bound": 0,
        "buckets": [
            {
                "key": "term1",
                "doc_count": 30235
            },
            {
                "key": "term2",
                "doc_count": 30216
            },
            {
                "key": "term3",
                "doc_count": 22177
            },
            {
                "key": "term4",
                "doc_count": 17173
            }
        ]
    }
}

I want this metric exemple between "term1" and "term4" : 56%

1

There are 1 answers

0
Tomer Cagan On

I think scripted_metric could help.

Take a look at my answer for a different this question.

In your case, you could count over the two terms and then return term4Cnt / term1Cnt. A rough estimate of what you'd need:

"init_script": "_agg.term1Cnt = 0; _agg.term4Cnt = 0;",
"map_script": "if (doc.json.eventName == "term1") { 
                   _agg.term1Cnt += 1; 
               } else if (doc.json.eventName == "term4") { 
                  _agg.term4Cnt += 1;",
               }"
"reduce_script": "term1Cnt = 0; term4Cnt = 0; 
                  for (agg in _aggs) {  
                     term1Cnt += agg.term1Cnt; 
                     term4Cnt += agg.term4Cnt;
                  }; 
                  return term4Cnt / term4Cnt;"

This assumes that you know your terms (event name) in advance. You can also filter on the relevant events.

Hopes this helps.