Elasticsearch: accuracy on a filter aggregation

272 views Asked by At

I'm fairly new to Elasticsearch (using version 2.2). To simplify my question, I have documents that have a field named termination, which can sometimes take the value transfer.

I currently do this request to aggregate by month the number of documents which have that termination :

{
  "size": 0,
  "sort": [{
    "@timestamp": {
      "order": "desc",
      "unmapped_type": "boolean"
    }
  }],
  "query": { "match_all": {} },
  "aggs": {
    "report": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "month",
        "min_doc_count": 0
      },
      "aggs": {
        "documents_with_termination_transfer": {
          "filter": {
            "term": {
              "termination": "transfer"
            }
          }
        }
      }
    }
  }
}

Here is the response :

{
    "_shards": {
        "failed": 0, 
        "successful": 206, 
        "total": 206
    }, 
    "aggregations": {
        "report": {
            "buckets": [
                {
                    "calls_with_termination_transfer": {
                        "doc_count": 209163
                    }, 
                    "doc_count": 278100, 
                    "key": 1451606400000, 
                    "key_as_string": "2016-01-01T00:00:00.000Z"
                }, 
                {
                    "calls_with_termination_transfer": {
                        "doc_count": 107244
                    }, 
                    "doc_count": 136597, 
                    "key": 1454284800000, 
                    "key_as_string": "2016-02-01T00:00:00.000Z"
                }
            ]
        }
    }, 
    "hits": {
        "hits": [], 
        "max_score": 0.0, 
        "total": 414699
    }, 
    "timed_out": false, 
    "took": 90
}

Why is the number of hits (414699) greater than the total number of document counts (278100 + 136597 = 414697)? I had read about accuracy problems but it didn't seem to apply in the case of filters... Is there also an accuracy problem if I sum the total numbers of documents with transfer termination ?

1

There are 1 answers

1
TautrimasPajarskas On BEST ANSWER

My guess is that some documents have a missing @timestamp.

You could verify this by running exists query on this field.