Return only elements of an array in an object that contain a certain value

1.7k views Asked by At

I've got the following document in an elastic search index:

{
    "type": "foo",
    "components": [{
            "id": "1234123", ,
            "data_collections": [{
                    "date_time": "2020-03-02T08:14:48+00:00",
                    "group": "1",
                    "group_description": "group1",
                    "measures": [{
                            "measure_name": "MEASURE_1",
                            "actual": "23.34"
                        }, {
                            "measure_name": "MEASURE_2",
                            "actual": "5"
                        }, {
                            "measure_name": "MEASURE_3",
                            "actual": "string_message"
                        }, {
                            "measure_name": "MEASURE_4",
                            "actual": "another_string"
                        }
                    ]
                },
                {
                    "date_time": "2020-03-03T08:14:48+00:00",
                    "group": "2",
                    "group_description": "group2",
                    "measures": [{
                            "measure_name": "MEASURE_1",
                            "actual": "23.34"
                        }, {
                            "measure_name": "MEASURE_4",
                            "actual": "foo"
                        }, {
                            "measure_name": "MEASURE_5",
                            "actual": "bar"
                        }, {
                            "measure_name": "MEASURE_6",
                            "actual": "4"
                        }
                    ]
                }
            ]
        }
    ]
}

Now I'm trying to figure out a mapping and a query for this document so the result would only contain the groups and measure_names I am interesed in. So far I'm able to query but I'll always retrieve the whole document which is not feasible since the array of measures can be quite large and most of the time I'd like a small subset.

For example I'm search for documents with "group": "1" and "measure_name": "MEASURE_" and the result I'd like to achieve looks like this:

{
    "_id": "oiqwueou8931283u12",
    "_source": {
        "type": "foo",
        "components": [{
                "id": "1234123", ,
                "data_collections": [{
                        "date_time": "2020-03-02T08:14:48+00:00",
                        "group": "1",
                        "group_description": "group1",
                        "measures": [{
                                "measure_name": "MEASURE_1",
                                "actual": "23.34"
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

I think what comes close to what I am looking for is the source parameter, but as far as I know there is no way to filter for values like {"measure_name": {"value": "MEASURE_1"}}

Thanks.

1

There are 1 answers

0
Joe - Check out my books On BEST ANSWER

The simplest mapping that comes to mind is

PUT timo
{
  "mappings": {
    "properties": {
      "components": {
        "type": "nested",
        "properties": {
          "data_collections": {
            "type": "nested",
            "properties": {
              "measures": {
                "type": "nested"
              }
            }
          }
        }
      }
    }
  }
}

and the search query would be

GET timo/_search
{
  "_source": ["inner_hits", "type", "components.id"], 
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "components.data_collections",
            "query": {
              "term": {
                "components.data_collections.group.keyword": {
                  "value": "1"
                }
              }
            },
            "inner_hits": {}
          }
        },
        {
          "nested": {
            "path": "components.data_collections.measures",
            "query": {
              "term": {
                "components.data_collections.measures.measure_name.keyword": {
                  "value": "MEASURE_1"
                }
              }
            },
            "inner_hits": {}
          }
        }
      ]
    }
  }
}

Notice the inner_hits param under each subquery and that the _source param is limited so that we don't return the whole hit, but rather only the subgroups that did match. type and component.id cannot be "seen" in the nested fields so we've included them explicitly.

The response should then look like this: enter image description here

You now have precisely the attributes you need so a bit of post-processing will get you the desired format!


I'm not familiar w/ a cleaner way of doing this but if any of y'all do, I'd be glad to learn it.