Elastic Search Query for Multi-valued Data

106 views Asked by At

ES Data is indexed like this :

 {
  "addresses" : [
                {
                 "id" : 69,
                 "location": "New Delhi"
                },
               {
               "id" : 69,
               "location": "Mumbai"
               }
            ],
  "goods" : [
            {
            "id" : 396,
            "name" : "abc",
            "price" : 12500
            },
           {
           "id" : 167,
           "name" : "XYz",
           "price" : 12000
           },
           {
            "id" : 168,
            "name" : "XYz1",
            "price" : 11000
           },
           {
            "id" : 169,
            "name" : "XYz2",
            "price" : 13000
          }
        ]
      }

In my query I want to fetch records which should have at-least one of the address matched and goods price range between 11000 and 13000 and name xyz.

1

There are 1 answers

0
Andreas Jägle On

When your data contains arrays of complex objects like a list of addresses or a list of goods, you probably want to have a look at elasticsearch's nested objects to avoid running into problems when your queries result in more items than you would expect.

The issue here is the way how elasticsearch (and in effect lucene) stores the data. As there is no such concept of lists of nested objects directly, the data is flattened and the connection between e.g. XYz and 12000 is lost. So you would also get this document as result when you query for XYz and 12500 as the price of 12500 is also there in the list of values for goods.price. To avoid this, you can use the nested objects feature of elasticsearch which basically extracts all inner objects into a hidden index and allows querying for several fields that occur in one specific object instead of "in any of the objects". For more details, have a look at the docs on nested objects which also explains this pretty good.

In your case a mapping could look like the following. I assume, you only want to query for the addresses.location text without providing the id, so that this list can remain the simple object type instead of also being a nested type. Also, I assume you query for exact matches. If this is not the case, you need to switch from keyword to text and adapt the term query to be some match one...

PUT nesting-sample
{
  "mappings": {
    "item": {
      "properties": {
        "addresses": {
          "properties": {
            "id": {"type": "integer"},
            "location": {"type": "keyword"}
          }
        },
        "goods": {
          "type": "nested",
          "properties": {
            "id": {"type": "integer"},
            "name": {"type": "keyword"},
            "price": {"type": "integer"}
          }
        }
      }
    }
  }
}

You can then use a bool query on the location and a nested query to match the inner documents of your goods list.

GET nesting-sample/item/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "addresses.location": "New Delhi"
          }
        },
        {
          "nested": {
            "path": "goods",
            "query": {
              "bool": {
                "must": [
                  {
                    "range": {
                      "goods.price": {
                        "gte": 12200,
                        "lt": 12999
                      }
                    }
                  },
                  {
                    "term": {
                      "goods.name": {
                        "value": "XYz"
                      }
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

This query will not match the document because the price range is not in the same nested object as the exact name of the good. If you change the lower bound to 12000 it will match.

Please check your use case and be aware of the warning on the bottom of the docs regarding the mapping explosion when using nested fields.