ElasticSearch Filter by sum of nested documents

864 views Asked by At

I am trying to filter products where a sum of properties in the nested filtered objects is in some range.

I have the following mapping:

{
  "product": {
    "properties": {
      "warehouses": {
        "type": "nested",
        "properties": {
          "stock_level": {
            "type": "integer"
          }
        }
      }
    }
  }
}

Example data:

{
  "id": 1,
  "warehouses": [
    {
      "id": 2001,
      "stock_level": 5
    },
    {
      "id": 2002,
      "stock_level": 0
    },
    {
      "id": 2003,
      "stock_level": 2
    }
  ]
}

In ElasticSearch 5.6 I used to do this:

GET products/_search
{
  "query": {
    "bool": {
      "filter": [
        [
          {
            "script": {
              "script": {
                "source": """
int total = 0;
for (def warehouse: params['_source']['warehouses']) {
  if (params.warehouse_ids == null || params.warehouse_ids.contains(warehouse.id)) {
    total += warehouse.stock_level;
  }
}
boolean gte = true;
boolean lte = true;
if (params.gte != null) {
  gte = (total >= params.gte);
}
if (params.lte != null) {
  lte = (total <= params.lte);
}
return (gte && lte);

""",
                "lang": "painless",
                "params": {
                  "gte": 4
                }
              }
            }
          }
        ]
      ]
    }
  }
}

The problem is that params['_source']['warehouses'] no longer works in ES 6.8, and I am unable to find a way to access nested documents in the script.

I have tried:

  • doc['warehouses'] - returns error (“No field found for [warehouses] in mapping with types []" )
  • ctx._source.warehouses - “Variable [ctx] is not defined.”

I have also tried to use scripted_field but it seems that scripted fields are getting calculated on the very last stage and are not available during query.

I also have a sorting by the same logic (sort products by the sum of stocks in the given warehouses), and it works like a charm:

  "sort": {
    "warehouses.stock_level": {
      "order": "desc",
      "mode": "sum",
      "nested": {
        "path": "warehouses"
        "filter": {
           "terms": {
             "warehouses.id": [2001, 2003]
           }
        }
      }
    }
  }

But I can't find a way to access this sort value either :(

Any ideas how can I achieve this? Thanks.

1

There are 1 answers

4
Joe - Check out my books On

I recently had the same issue. It turns out the change occurred somewhere around 6.4 during refactoring and while accessing _source is strongly discouraged, it looks like people are still using / wanting to use it.

Here's a workaround taking advantage of the include_in_root parameter.

  1. Adjust your mapping
PUT product
{
  "mappings": {
    "properties": {
      "warehouses": {
        "type": "nested",
        "include_in_root": true,     <--
        "properties": {
          "stock_level": {
            "type": "integer"
          }
        }
      }
    }
  }
}
  1. Drop & reindex
  2. Reconstruct the individual warehouse items in a for loop while accessing the flattened values:
GET product/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "script": {
            "script": {
              "source": """
                  int total = 0;
                  
                  def ids = doc['warehouses.id'];
                  def levels = doc['warehouses.stock_level'];
                  
                  for (def i = 0; i <  ids.length; i++) {
                    def warehouse = ['id':ids[i], 'stock_level':levels[i]];
                    
                    if (params.warehouse_ids == null || params.warehouse_ids.contains(warehouse.id)) {
                      total += warehouse.stock_level;
                    }
                  }
                  
                  boolean gte = true;
                  boolean lte = true;
                  if (params.gte != null) {
                    gte = (total >= params.gte);
                  }
                  if (params.lte != null) {
                    lte = (total <= params.lte);
                  }
                  return (gte && lte);
              """,
              "lang": "painless",
              "params": {
                  "gte": 4
              }
            }
          }
        }
      ]
    }
  }
}

Be aware that this approach assumes that all warehouses include a non-null id and stock level.