Unable to combine "bool" query with "knn" query - Elastisearch

305 views Asked by At

I am trying to frame an ElasticSearch query to fetch some results from my index for a search engine project. I am using pproximate KNN for the same. Now, I have a couple of conditions that I want to add to my KNN query but it doesn't seem to be working as expected.

Here is a sample of my full query.

{
    "_source": {
        "includes": [
            "id",
            "name"
        ]
    },
    "from": 0,
    "size": 60,
    "query": {
        "bool": {
            "must_not": [
                {
                    "term": {
                        "id": 12345
                    } 
                }
            ]
        }
    },
    "knn": {
        "field": "text_embedding.predicted_value",
        "k": 100,
        "num_candidates": 300,
        "query_vector_builder": {
            "text_embedding": {
                "model_id": "sentence-transformers__all-minilm-l6-v2",
                "model_text": "Loreum Epsom"
            }
        }
    }
}

...the "id" field is mapped as an 'integer' in my elasticsearch Index. It was expected that the results must not include the _doc with "id" = 12345 but it returns the _doc with that "id". what's wrong?

2

There are 2 answers

2
Val On BEST ANSWER

If you want to exclude a specific set of documents, you need to use a filtered knn query (available since ES 8.4):

{
  "_source": {
    "includes": [
      "id",
      "name"
    ]
  },
  "from": 0,
  "size": 60,
  "knn": {
    "field": "text_embedding.predicted_value",
    "k": 100,
    "num_candidates": 300,
    "query_vector_builder": {
      "text_embedding": {
        "model_id": "sentence-transformers__all-minilm-l6-v2",
        "model_text": "Loreum Epsom"
      }
    },
    "filter": {                <--- add your filter here
      "bool": {
        "must_not": [
          {
            "term": {
              "id": 12345
            }
          }
        ]
      }
    }
  }
}
1
Kapila Shobit On

The issue lies in the usage of "term" query within the "must_not" clause. The "term" query is designed for exact matching against string fields, whereas your "id" field is mapped as an integer. This mismatch is causing the unexpected inclusion of the document with "id" = 12345 in the results. To effectively exclude the document with "id" = 12345, you should use the "range" query instead of the "term" query. The "range" query is suitable for numeric fields and allows you to specify a range of values to exclude.

change the must_not clause :-

    "must_not": [
        {
            "range": {
                "id": {
                    "from": 12345,
                    "to": 12345 + 1
                }
            }
        }
    ]

refer - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html

There is a partiular section in This link which provides a specific example of how to use the range query to exclude documents based on a numeric value

hope this answer helps.....