ES query to match all elements in array

2k views Asked by At

So I got this document with a nested array that I want to filter with this query.

I want ES to return all documents where all items have changes = 0 and that only. If document has even a single item in the list with a change = 1, that's discarded.

Is there any way I can achieve this starting from the query I have already wrote? Or should I use a script instead?

DOCUMENTS:

{
    "id": "abc",
    "_source" : {
        "trips" : [
            {
                "type" : "home",
                "changes" : 0
            },
            {
                "type" : "home",
                "changes" : 1
            }
        ]
    }
},
{
        "id": "def",
        "_source" : {
            "trips" : [
                {
                    "type" : "home",
                    "changes" : 0
                },
                {
                    "type" : "home",
                    "changes" : 0
                }
            ]
        }
    }

QUERY:

GET trips_solutions/_search

    {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "id": {
                  "value": "abc"
                }
              }
            },
            {
              "nested": {
                "path": "trips",
                "query": {
                  "range": {
                    "trips.changes": {
                      "gt": -1,
                      "lt": 1
                    }
                  }
                }
              }
            }
          ]
        }
      }
    }

EXPECTED RESULT:

{
            "id": "def",
            "_source" : {
                "trips" : [
                    {
                        "type" : "home",
                        "changes" : 0
                    },
                    {
                        "type" : "home",
                        "changes" : 0
                    }
                ]
            }
        }

Elasticsearch version: 7.6.2

Already read this answers but they didn't help me: https://discuss.elastic.co/t/how-to-match-all-item-in-nested-array/163873 ElasticSearch: How to query exact nested array

1

There are 1 answers

4
Joe - Check out my books On BEST ANSWER

First off, if you filter by id: abc, you obviously won't be able to get id: def back.

Second, due to the nature of nested fields which are treated as separate subdocuments, you cannot query for all trips that have the changes equal to 0 -- the connection between the individual trips is lost and they "don't know about each other".

What you can do is return only the trips that matched your nested query using inner_hits:

GET trips_solutions/_search
{
  "_source": "false",
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "inner_hits": {},
            "path": "trips",
            "query": {
              "term": {
                "trips.changes": {
                  "value": 0
                }
              }
            }
          }
        }
      ]
    }
  }
}

The easiest solution then is to dynamically save this nested info on a parent object like discussed here and using range/term query on the resulting array.


EDIT:

Here's how you do it using copy_to onto the doc's top level:

PUT trips_solutions
{
  "mappings": {
    "properties": {
      "trips_changes": {
        "type": "integer"
      },
      "trips": {
        "type": "nested",
        "properties": {
          "changes": {
            "type": "integer",
            "copy_to": "trips_changes"
          }
        }
      }
    }
  }
}

trips_changes will be an array of numbers -- I presume they're integers but more types are available.

Then syncing a few docs:

POST trips_solutions/_doc
{"trips":[{"type":"home","changes":0},{"type":"home","changes":1}]}

POST trips_solutions/_doc
{"trips":[{"type":"home","changes":0},{"type":"home","changes":0}]}

And finally querying:

GET trips_solutions/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "trips",
            "query": {
              "term": {
                "trips.changes": {
                  "value": 0
                }
              }
            }
          }
        },
        {
          "script": {
            "script": {
              "source": "doc.trips_changes.stream().filter(val -> val != 0).count() == 0"
            }
          }
        }
      ]
    }
  }
}

Note that we first filter normally using the nested term query to narrow down our search context (scripts are slow so this is useful). We then check if there are any non-zero changes in the accumulated top-level changes and reject those that apply.