Sort by date and number of writes by day in ElasticSearch (Secondary sort)

256 views Asked by At

My index has two fields -

  • updated which is a date field (eg. "2020-01-04T05:00:06.870000Z")
  • numWrites which is a float

I need a query that -

  • sorts by date (YYYY-mm-dd) in descending order
  • for each day, sorts by numWrites in descending order

Sample data :

  "_source": {
    "updated": "2020-01-04T05:00:06.870000Z",
    "numWrites": 5.0
  }

  
  "_source": {
    "updated": "2020-01-04T09:00:08.870000Z",
    "numWrites": 3.0
  }

  "_source": {
    "updated": "2019-12-04T01:00:06.870000Z",
    "numWrites": 15.0
  }

  "_source": {
    "updated": "2019-12-04T04:00:06.870000Z",
    "numWrites": 12.0
     }
}

The following query sorts by date. But, within the same day, it doesn't sort by numWrites as expected, as the timestamps within the same day are different. How can I extract date in the from YYYY-mm-dd and then sort by numWrites within a day ?

Query :

{
  "sort":[
     {"updated": {"order" : "desc"}},
     {"numWrites": {"order" : "desc"}}
  ]
} 

Results :

 "_source": {
    "updated_time": "2020-01-04T09:00:08.870000Z",
    "numWrites": 3.0
  }

"_source": {
    "updated": "2020-01-04T05:00:06.870000Z",
    "numWrites": 5.0
  }
    
"_source": {
    "updated_time": "2019-12-04T04:00:06.870000Z",
    "numWrites": 12.0
 }

 "_source": {
    "updated_time": "2019-12-04T01:00:06.870000Z",
    "numWrites": 15.0
  }
1

There are 1 answers

2
Ashee On

If I have understood your question well, you can use a sub aggregation to sort by date first, then by numWrites for each of those days. Here is a suggested solution:

"size": 0,
"aggs": {
    "sort_by_date": {
    "terms": {
        "field": "updated",
        "order": {
        "_key": "desc"
        }
    },
    "aggs": {
        "sort_by_numWrites_per day": {
        "terms": {
            "field": "numWrites",
            "order": {
            "_key": "desc"
            }
        }
        }
    }
    }
}

I tried adding two numWrites on the same date. Here is my sample index:

    {
    "_index" : "test-sort",
    "_type" : "_doc",
    "_id" : "2kRNZ3QByAa8PXf3rJBC",
    "_score" : 1.0,
    "_source" : {
    "updated" : "2020-01-04T05:00:06.870000Z",
    "numWrites" : 5.0
    }
},
{
    "_index" : "test-sort",
    "_type" : "_doc",
    "_id" : "20RNZ3QByAa8PXf3rJBC",
    "_score" : 1.0,
    "_source" : {
    "updated" : "2020-01-04T09:00:08.870000Z",
    "numWrites" : 3.0
    }
},
{
    "_index" : "test-sort",
    "_type" : "_doc",
    "_id" : "3ERNZ3QByAa8PXf3rJBC",
    "_score" : 1.0,
    "_source" : {
    "updated" : "2019-12-04T01:00:06.870000Z",
    "numWrites" : 15.0
    }
},
{
    "_index" : "test-sort",
    "_type" : "_doc",
    "_id" : "3URNZ3QByAa8PXf3rJBC",
    "_score" : 1.0,
    "_source" : {
    "updated" : "2019-12-04T04:00:06.870000Z",
    "numWrites" : 12.0
    }

I have two numWrites for the date "2019-12-04T04:00:06.870Z" and the result I obtained is:

    {
    "key" : 1578128408870,
    "key_as_string" : "2020-01-04T09:00:08.870Z",
    "doc_count" : 1,
    "sort_by_numWrites_per day" : {
    "doc_count_error_upper_bound" : 0,
    "sum_other_doc_count" : 0,
    "buckets" : [
        {
        "key" : 3.0,
        "doc_count" : 1
        }
    ]
    }
},
{
    "key" : 1578114006870,
    "key_as_string" : "2020-01-04T05:00:06.870Z",
    "doc_count" : 1,
    "sort_by_numWrites_per day" : {
    "doc_count_error_upper_bound" : 0,
    "sum_other_doc_count" : 0,
    "buckets" : [
        {
        "key" : 5.0,
        "doc_count" : 1
        }
    ]
    }
},
{
    "key" : 1575432006870,
    "key_as_string" : "2019-12-04T04:00:06.870Z",
    "doc_count" : 2,
    "sort_by_numWrites_per day" : {
    "doc_count_error_upper_bound" : 0,
    "sum_other_doc_count" : 0,
    "buckets" : [
        {
        "key" : 12.0,
        "doc_count" : 1
        },
        {
        "key" : 10.0,
        "doc_count" : 1
        }
    ]
    }
},
{
    "key" : 1575421206870,
    "key_as_string" : "2019-12-04T01:00:06.870Z",
    "doc_count" : 1,
    "sort_by_numWrites_per day" : {
    "doc_count_error_upper_bound" : 0,
    "sum_other_doc_count" : 0,
    "buckets" : [
        {
        "key" : 15.0,
        "doc_count" : 1
        }