Elasticsearch DSL queries - optional should terms & scores

269 views Asked by At

I'm pretty new on Elasticsearch world and I might be missing some concept.

That's the scenario I'm not understanding:

I want to find a doc from the following criteria:

  • category.level = A
  • category.name = "John .G" OR "Chris T."
  • approved = yes (optional)

Mappings:

PUT data
{
  "mappings": {
    "properties": {
      "createdAt": {
        "type":   "date",
        "format": "yyyy-MM-dd HH:mm:ss.SSSZ"
      },
      "category": {
        "type": "nested",
        "properties": {
          "name": {
            "type":   "text",
            "analyzer": "keyword"
          }
        }
      },
      "approved": {
        "type":   "text",
        "analyzer": "keyword"
      }
    }
  }
}

Data:

POST data/_create/1
{  
  "category": [
      {
        "name": "John G.",
        "level": "A"
      },
      {
        "name": "Mary F.",
        "level": "A"
      }
  ],
  "createdBy": "John",
  "createdAt": "2022-04-18 19:09:27.527+0200",
  "approved": "yes"
}

POST data/_create/2
{  
  "category": [
      {
        "name": "John G.",
        "level": "A"
      },
      {
        "name": "Chris T.",
        "level": "A"
      }
  ],
  "createdBy": "John",
  "createdAt": "2022-04-18 19:09:27.527+0200",
  "approved": "no"
}

POST data/_create/3
{  
  "category": [
      {
        "name": "John G.",
        "level": "C"
      },
      {
        "name": "Phil C.",
        "level": "C"
      }
  ],
  "createdBy": "John",
  "createdAt": "2022-04-18 19:09:27.527+0200",
  "approved": "no"
}

POST data/_create/4
{  
  "category": [
      {
        "name": "John G.",
        "level": "A"
      },
      {
        "name": "Chris T.",
        "level": "A"
      }
  ],
  "createdBy": "John",
  "createdAt": "2020-04-18 19:09:27.527+0200",
  "approved": "yes"
}

POST data/_create/5
{  
  "category": [
      {
        "name": "Unknown A.",
        "level": "A"
      },
      {
        "name": "Unknown B.",
        "level": "A"
      }
  ],
  "createdBy": "Unknown",
  "createdAt": "2020-08-18 19:09:27.527+0200",
  "approved": "yes"
}

Query:

GET data/_search
{
  "query": {
    "nested": {
      "path": "category",
      "query": {
        "bool": {
          "must": [
            {"match": {"category.level": "A"}}
          ],
          "should": [
            {"term": {"category.name": "John G."}},
            {"term": {"category.name": "Chris T."}},
            {"term": {"approved": "yes"}}
          ],
          "minimum_should_match": 1
        }
      }
    }
  }
}

Response:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.4455402,
    "hits" : [
      {
        "_index" : "data",
        "_id" : "2",
        "_score" : 1.4455402,
        "_source" : {
          "category" : [
            {
              "name" : "John G.",
              "level" : "A"
            },
            {
              "name" : "Chris T.",
              "level" : "A"
            }
          ],
          "createdBy" : "John",
          "createdAt" : "2022-04-18 19:09:27.527+0200",
          "approved" : "no"
        }
      },
      {
        "_index" : "data",
        "_id" : "4",
        "_score" : 1.4455402,
        "_source" : {
          "category" : [
            {
              "name" : "John G.",
              "level" : "A"
            },
            {
              "name" : "Chris T.",
              "level" : "A"
            }
          ],
          "createdBy" : "John",
          "createdAt" : "2020-04-18 19:09:27.527+0200",
          "approved" : "yes"
        }
      },
      {
        "_index" : "data",
        "_id" : "1",
        "_score" : 1.151647,
        "_source" : {
          "category" : [
            {
              "name" : "John G.",
              "level" : "A"
            },
            {
              "name" : "Mary F.",
              "level" : "A"
            }
          ],
          "createdBy" : "John",
          "createdAt" : "2022-04-18 19:09:27.527+0200",
          "approved" : "yes"
        }
      }
    ]
  }
}

Questions:

  1. Why the first document returned is an approval = no? I was expecting that docs with approval = yes would be better scored.
  2. Why doc with index = 5 (it doesn't attend the criteria category.name, but it does for approved = yes) is not being returned?
  3. The optionality of approved = yes is not being expressed in the above query. How could I create a kind of extra separated should term with minimum_should_match: 0 ? Something that would increase the score but would not filter the results.
2

There are 2 answers

0
Sagar Patel On BEST ANSWER

You need to use below query, which have main bool query. it have first must clause with nested query and it have bool query for category.level field and then another bool query with should clause for category.name field.

Now main bool query have should clause for approved which is used for boosting result with yes value (this is outside nested query).

POST data/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "category",
            "query": {
              "bool": {
                "must": [
                  {
                    "term": {
                      "category.level": {
                        "value": "a"
                      }
                    }
                  },
                  {
                    "bool": {
                      "should": [
                        {
                          "term": {
                            "category.name": "John G."
                          }
                        },
                        {
                          "term": {
                            "category.name": "Chris T."
                          }
                        }
                      ]
                    }
                  }
                ]
              }
            }
          }
        }
      ],
      "should": [
        {
          "term": {
            "approved": "yes"
          }
        }
      ]
    }
  }
}

Result:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.9845366,
    "hits" : [
      {
        "_index" : "data",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.9845366,
        "_source" : {
          "category" : [
            {
              "name" : "John G.",
              "level" : "A"
            },
            {
              "name" : "Chris T.",
              "level" : "A"
            }
          ],
          "createdBy" : "John",
          "createdAt" : "2020-04-18 19:09:27.527+0200",
          "approved" : "yes"
        }
      },
      {
        "_index" : "data",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.6906434,
        "_source" : {
          "category" : [
            {
              "name" : "John G.",
              "level" : "A"
            },
            {
              "name" : "Mary F.",
              "level" : "A"
            }
          ],
          "createdBy" : "John",
          "createdAt" : "2022-04-18 19:09:27.527+0200",
          "approved" : "yes"
        }
      },
      {
        "_index" : "data",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.4455402,
        "_source" : {
          "category" : [
            {
              "name" : "John G.",
              "level" : "A"
            },
            {
              "name" : "Chris T.",
              "level" : "A"
            }
          ],
          "createdBy" : "John",
          "createdAt" : "2022-04-18 19:09:27.527+0200",
          "approved" : "no"
        }
      }
    ]
  }
}

Why the first document returned is an approval = no? I was expecting that docs with approval = yes would be better scored.

Because you have should clause inside nested query and it is no matching to any document as approved is outside category hence it is not changing score.

Why doc with index = 5 (it doesn't attend the criteria category.name, but it does for approved = yes) is not being returned?

it is removed by your must clause, but if you need index =5 document as well then you can add two should clause, one for nested and one for approved and it will resolved your issue.

Your question 3 also resolved by my answer.

0
Amit On

I tried your scenario with your mapping and sample data, and found the issue, you are using approved:yes in the nested query context which is causing the issue, which is causing the issue, if you change the query to below(Basically using approved:yes in the should block but outside the nested query), it solves all your issues.

{
    "query": {
        "bool": {
            "should": [
                {
                    "nested": {
                        "path": "category",
                        "query": {
                            "bool": {
                                "must": [
                                    {
                                        "match": {
                                            "category.level": "A"
                                        }
                                    }
                                ],
                                "should": [
                                    {
                                        "term": {
                                            "category.name": "John G."
                                        }
                                    },
                                    {
                                        "term": {
                                            "category.name": "Chris T."
                                        }
                                    }
                                ]
                            }
                        }
                    }
                },
                {
                    "term": {
                        "approved": "yes"
                    }
                }
            ]
        }
    }
}

And search result

"hits": [
            {
                "_index": "71967271",
                "_id": "4",
                "_score": 1.9845366,
                "_source": {
                    "category": [
                        {
                            "name": "John G.",
                            "level": "A"
                        },
                        {
                            "name": "Chris T.",
                            "level": "A"
                        }
                    ],
                    "createdBy": "John",
                    "createdAt": "2020-04-18 19:09:27.527+0200",
                    "approved": "yes"
                }
            },
            {
                "_index": "71967271",
                "_id": "2",
                "_score": 1.4455402,
                "_source": {
                    "category": [
                        {
                            "name": "John G.",
                            "level": "A"
                        },
                        {
                            "name": "Chris T.",
                            "level": "A"
                        }
                    ],
                    "createdBy": "John",
                    "createdAt": "2022-04-18 19:09:27.527+0200",
                    "approved": "no"
                }
            },
            {
                "_index": "71967271",
                "_id": "1",
                "_score": 1.2437345,
                "_source": {
                    "category": [
                        {
                            "name": "John G.",
                            "level": "A"
                        },
                        {
                            "name": "Mary F.",
                            "level": "A"
                        }
                    ],
                    "createdBy": "John",
                    "createdAt": "2022-04-18 19:09:27.527+0200",
                    "approved": "yes"
                }
            },
            {
                "_index": "71967271",
                "_id": "5",
                "_score": 0.7968255,
                "_source": {
                    "category": [
                        {
                            "name": "Unknown A.",
                            "level": "A"
                        },
                        {
                            "name": "Unknown B.",
                            "level": "A"
                        }
                    ],
                    "createdBy": "Unknown",
                    "createdAt": "2020-08-18 19:09:27.527+0200",
                    "approved": "yes"
                }
            }
        ]