ES query to match as many words from the query

473 views Asked by At

I have a few million documents in my index. I have a sentence and want to retrieve the document that matches as many words. I need to search only one field content

curl -X GET "xxx.com:9200/test/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "query" : {
        "bool" : { "must" : [{"term": {"content": {"value": "popular artworks of Banksy"}}}]
    }}
}
'

I want the document which has as many words from the query and more the better. If there is a document with text that has many occurrences of artwork, Banksy, and a few popular - it should be scored high. Additionally, is it possible to give less weight to a match to a word that occurs more commonly than others? Like more weight to popular than Banksy. I understand that I could use boost. But I don't want to set these values manually. I want it to have an implicit understanding if possible.

1

There are 1 answers

4
ESCoder On BEST ANSWER

Adding a working example with index data, search query, and search result.

Refer ES documentation on match_phrase query and bool queries to get a detailed explanation.

Index Data:

{
    "content":"popular popular popular artworks artworks Banksy"
}
{
    "content":"popular artworks Banksy"
}
{
    "content":"popular popular artworks Banksy"
}
{
    "content": "popular artworks Banksy Banksy"
}
{
    "content": "popular popular popular artworks artworks artworks Banksy"
}

Search Query:

    {
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "content": "popular artworks of Banksy"
          }
        },
        {
          "match_phrase":{
            "content":"popular artworks Banksy Banksy"
          }
        }
      ]
    }
  }
}

Search Result:

"hits": [
      {
        "_index": "test1",
        "_type": "_doc",
        "_id": "4",
        "_score": 0.4776722,
        "_source": {
          "content": "popular artworks Banksy Banksy"
        }
      },
      {
        "_index": "test1",
        "_type": "_doc",
        "_id": "5",
        "_score": 0.22413516,
        "_source": {
          "content": "popular popular popular artworks artworks artworks Banksy"
        }
      },
      {
        "_index": "test1",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.22279418,
        "_source": {
          "content": "popular popular popular artworks artworks Banksy"
        }
      },
      {
        "_index": "test1",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.21652403,
        "_source": {
          "content": "popular popular artworks Banksy"
        }
      },
      {
        "_index": "test1",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.21318543,
        "_source": {
          "content": "popular artworks Banksy"
        }
      }
    ]