Elasticsearch aggregation with reverse_nested path parameter

1.1k views Asked by At

Could anybody elaborate more on the "path" param in reverse_nested tag in elasticsearch aggregation? I am trying to aggregated nested buckets using keys in different nesting level. Here are the details:

Creating an index with following mapping

PUT agg
{
  "mappings": {
    "sample": {
      "properties": {
        "product": {
          "type": "object",
          "properties": {
            "name": {
              "type": "keyword"
            },
            "category": {
              "type": "keyword"
            }
          }
        },
        "features": {
          "type": "nested",
          "properties": {
            "color": {
              "type": "keyword"
            },
            "details": {
              "type": "text"
            },
            "finish": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

Indexing some documents in the "agg" index:

POST _bulk
{ "index" : { "_index" : "agg", "_type" : "sample", "_id" : "1" } }
{"product":{"name":"tv","category":"electronics"},"features":[{"color":"black","details":"jet black in color"},{"finish":"matte"}]}
{ "index" : { "_index" : "agg", "_type" : "sample", "_id" : "2" } }
{"product":{"name":"tv","category":"electronics"},"features":[{"color":"black","details":"jet black in color"},{"finish":"glossy"}]}
{ "index" : { "_index" : "agg", "_type" : "sample", "_id" : "3" } }
{"product":{"name":"tv","category":"electronics"},"features":[{"color":"red","details":"apple red in color"},{"finish":"matte"}]}
{ "index" : { "_index" : "agg", "_type" : "sample", "_id" : "4" } }
{"product":{"name":"tv","category":"electronics"},"features":[{"color":"red","details":"blood red in color"},{"finish":"matte"}]}

The following aggregation works as expected: (Buckets of colors contain the bucket of finish):

GET agg/_search
{
  "size": 0,
  "aggs": {
    "root": {
      "nested": {
        "path": "features"
      },
      "aggs": {
        "colors": {
          "terms": {
            "field": "features.color",
            "size": 10
          },
          "aggs": {
            "colorToFinish": {
              "reverse_nested": {},
              "aggs": {
                "root": {
                  "nested": {
                    "path": "features"
                  },
                  "aggs": {
                    "finishes": {
                      "terms": {
                        "field": "features.finish",
                        "size": 10
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

However, the following doesn't seem to work as expected:

GET agg/_search
{
  "size": 0,
  "aggs": {
    "root": {
      "nested": {
        "path": "features"
      },
      "aggs": {
        "colors": {
          "terms": {
            "field": "features.color",
            "size": 10
          },
          "aggs": {
            "colorToFinish": {
              "reverse_nested": {
                "path": "features"
              },
              "aggs": {
                "finishes": {
                  "terms": {
                    "field": "features.finish",
                    "size": 10
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

In the non-working DSL, I am trying to come out of nesting to "features" and going deeper again to get the finish. This doesn't seem to gather buckets for "finish".

However, the approach, where we to the root document level and fetch the field from the first principle, it seems to work. So, it would seem I am not using the "path" param in reverse_nested correctly and possibly not landing at the right nesting. Would anybody know why the second query doesn't work?

1

There are 1 answers

0
bean On

The nested query/aggregation let you query/aggregate on nested objects. You have to specify the path which the query/aggregation goes into. The reverse_nested, on the other hand, let you jump out of the current nested query/aggregation. When reverse_nested is used, it already knows which nested object it should jump out of, so no path is required inside reverse_nested.

So in your first query, when reverse_nested is presented, the following aggregation will be on top level object such as 'product' and 'features'. Now you want to aggregate on the nested features.finish field, so you have to go into the nested object again by giving a nested term with path. And then you do normal agg on the nested features.finish field.

The second query doesn't work for two reasons: 1. reverse_nested does not support path and it's unnecessary. 2. After reverse_nested term, nested is still required when querying/aggregating on a nested field.