Elasticsearch: Possible to process aggregation results?

4.2k views Asked by At

I calculate the duration of my service-processes using the SUM-Aggregation. Each step of the executed process will be saved in Elasticsearch under a calling Id.

This is what I monitor:

Duration of Request-Processing for ID #123 (calling service #1)

Duration of Server-Response for ID #123 (calling service #1)

**Complete Duration for ID #123**

Duration of Request-Processing for ID #124 (calling service #1)

Duration of Server-Response for ID #124 (calling service #1)

**Complete duration for ID #124**

Filter:

{
"from" : 0, "size" :0,

    "query" : {
        "filtered" : {
            "query" : { "match_all" : {}},
            "filter" : {
                "term" : { 
                    "callingId" : "123",
                }
            }
        }
    },
    "aggs" : {
        "total_duration" : { "sum" : { "field" : "duration" } },
        "max_duration":{"max": {"field":"duration"}},   
        "min_duration":{"min":{"field":"duration"}}
        }
    }
    }

This returns the complete duration of the process and also tells me which part of the process was the fastest ans which part was the slowest.

Next I want to calculate the average duration of all finished processes by serviceId. In this case I only care about the total duration for each service, so I can comepare them.

How can I create the average, minimum and maximum from my total_durations?

EDIT: I added some sample Data, I hope you can work with it.

Call1:

{
"callerId":"U1",
"operation":"Initialize",
"status":"INITIALIZED",
"duration":1,
"serviceId":"1"
}

{
"callerId":"U1",
"operation":"Calculate",
"status":"STARTED",
"duration":1,
"serviceId":"1"
}

{
"callerId":"U1",
"operation":"Finish",
"status":"FINISHED",
"duration":1200,
"serviceId":"1"
}

sum: 1202

Call2:

{
"callerId":"U2",
"operation":"Initialize",
"status":"INITIALIZED",
"duration":2,
"serviceId":"1"
}

{
"callerId":"U2",
"operation":"Calculate",
"status":"STARTED",
"duration":1,
"serviceId":"1"
}

{
"callerId":"U2",
"operation":"Finish",
"status":"FINISHED",
"duration":1030,
"serviceId":"1"
}

sum: 1033

Aggregation for All Service-Calls for Service ID #1 This is what I want to calculate:

Max: 1202
Min: 1033
AVG: 1116
2

There are 2 answers

1
Andrei Stefan On BEST ANSWER

A bit more complicated, but here it goes (only in 1.4 because of this type of aggregation):

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "serviceId": 1
        }
      }
    }
  },
  "aggs": {
    "executionTimes": {
      "scripted_metric": {
        "init_script": "_agg['values'] = new java.util.HashMap();",
        "map_script": "if (_agg.values[doc['callerId'].value]==null) {_agg.values[doc['callerId'].value]=doc['duration'].value;} else {_agg.values[doc['callerId'].value].add(doc['duration'].value);}",
        "combine_script":"someHashMap = new java.util.HashMap();for(x in _agg.values.keySet()) {value=_agg.values[x]; sum=0; for(y in value) {sum+=y}; someHashMap.put(x,sum)}; return someHashMap;",
        "reduce_script": "finalArray = []; finalMap = new java.util.HashMap(); for(map in _aggs){for(x in map.keySet()){if(finalMap.containsKey(x)){value=finalMap.get(x);finalMap.put(x,value+map.get(x));} else {finalMap.put(x,map.get(x))}}}; finalAvgValue=0; finalMaxValue=-1; finalMinValue=-1; for(key in finalMap.keySet()){currentValue=finalMap.get(key);finalAvgValue+=currentValue; if(finalMinValue<0){finalMinValue=currentValue} else if(finalMinValue>currentValue){finalMinValue=currentValue}; if(currentValue>finalMaxValue) {finalMaxValue=currentValue}}; finalArray.add(finalMaxValue); finalArray.add(finalMinValue); finalArray.add(finalAvgValue/finalMap.size()); return finalArray",
        "lang": "groovy"
      }
    }
  }
}

Also, I'm not saying it's the best approach, but only one I could find. Also, I'm not saying that the solution is in its best form. Probably, it may be cleaned up and improved. I wanted to show, though, that it is possible. Keep in mind, though, it's available in 1.4.

The basic idea of the approach is to use the scripts to build a data structure that should hold the information you need, computed in different steps according to scripted metric aggregation. Also, the aggregation is performed for only one serviceId. If you want to do this for all serviceIds I think you might want to re-think a bit the data structure in the scripts.

For the query above and for the exact data you provided the output is this:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 6,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "executionTimes": {
         "value": [
            1202,
            1033,
            "1117.5"
         ]
      }
   }
}

The order of values in the array value is [max, min, avg], as per the script in reduce_script.

0
Lama On

There will be a new feature in upcoming version 2.0.0 called "Reducers". Reducers will allow you to calculate aggregations over aggregations.

Related Post: https://github.com/elasticsearch/elasticsearch/issues/8110