ElasticSearch aggregate on a scripted nested field

Question

ElasticSearch aggregate on a scripted nested field

711 views Asked by Doogyb At 03 March 2022 at 16:33

I have the following mapping in my ElasticSearch index (simplified as the other fields are irrelevant:

{
  "test": {
    "mappings": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "entities": {
          "type": "nested",
          "properties": {
            "text_property": {
              "type": "text"
            },
            "float_property": {
              "type": "float"
            }
          }
        }
      }
    }
  }
}

The data looks like this (again simplified):

[
  {
    "name": "a",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.4
      },
      {
        "text_property": "baz",
        "float_property": 0.6
      }
    ]
  },
  {
    "name": "b",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.9
      }
    ]
  },
  {
    "name": "c",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.9
      }
    ]
  }
]

I'm trying perform a bucket aggregation on the maximum value of float_property for each document. So for the example above, the following would be the desired response:

...
{
  "buckets": [
    {
      "key": "0.9",
      "doc_count": 2
    },
    {
      "key": "0.6",
      "doc_count": 1
    }
  ]
}

as doc a's highest nested value for float_property is 0.6, b's is 0.9 and c's is 0.9.

I've tried using a mixture of nested and aggs, along with runtime_mappings, but I'm not sure in which order to use these, or if this is even possible.

Original Q&A

There are 2 answers

Nishikant Tayade On 04 March 2022 at 07:01

I created the index with your mappings changing float_property type to double.

PUT testindex
{
 "mappings": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "entities": {
          "type": "nested",
          "include_in_parent": true, 
          "properties": {
            "text_property": {
              "type": "text"
            },
            "float_property": {
              "type": "double"
            }
          }
        }
      }
    }
}

Added "include_in_parent":true for "nested" type

Indexed the documents:

PUT testindex/_doc/1
{
    "name": "a",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.4
      },
      {
        "text_property": "baz",
        "float_property": 0.6
      }
    ]
  }
 
PUT testindex/_doc/2
{
    "name": "b",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.9
      }
    ]
  }
  
PUT testindex/_doc/3 
{
    "name": "c",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.9
      }
    ]
  }

Then Terms Aggregation on nested field:

POST testindex/_search
{
  "from": 0,
  "size": 30,
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "entities.float_property": {
      "terms": {
        "field": "entities.float_property"
      }
    }
  }
}

It produce the aggregation result as below:

"aggregations" : {
    "entities.float_property" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 0.2,
          "doc_count" : 2
        },
        {
          "key" : 0.9,
          "doc_count" : 2
        },
        {
          "key" : 0.4,
          "doc_count" : 1
        },
        {
          "key" : 0.6,
          "doc_count" : 1
        }
      ]
    }
  }

**Doogyb** · Accepted Answer · 2022-03-04T17:15:44+00:00

I've managed to figure this out in the end.

The two things I hadn't realised were:

You can provide a script instead of a field key to bucket aggregations.
Instead of using nested queries, you can access nested values directly using params._source.

The combination of these two things allowed me to write the correct query:

{
  "size": 0,
  "aggs": {
    "max.float_property": {
      "terms": {
        "script": "double max = 0; for (item in params._source.entities) { if (item.float_property > max) { max = item.float_property; }} return max;"
      }
    }
  }
}

Response:

{
  ...
  "aggregations": {
    "max.float_property": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "0.9",
          "doc_count": 2
        },
        {
          "key": "0.6",
          "doc_count": 1
        }
      ]
    }
  }
}

I'm confused though, because I thought the correct way to access nested fields was by using the nested query type. Unfortunately there's very little documentation for this, so I'm still unsure if this is the intended/correct way to aggregate on scripted nested fields.

TechQA.

ElasticSearch aggregate on a scripted nested field

There are 2 answers

Related Questions in ELASTICSEARCH

Related Questions in ELASTICSEARCH-AGGREGATION

Related Questions in ELASTICSEARCH-NESTED

Related Questions in ELASTICSEARCH-SCRIPTING

Popular Questions

Trending Questions