Elasticsearch relevance- documents with similar names

Question

Elasticsearch relevance- documents with similar names

126 views Asked by vishnu At 06 September 2017 at 06:57

I am looking for an approach to deal with elasticsearch's relevance for document names like "bottle" and "bottle caps"

When someone looks for a "bottle" (search term), - "bottle caps" should be scored lower than "Red bottles".

Currently our search engine scores "red coloured bottle" to be less relevant than "Bottle caps for 500ml bottle"

Original Q&A

There are 1 answers

**dshockley** · Accepted Answer · 2017-09-06T07:57:36+00:00

This is not something you can solve in Elasticsearch, without adding more information. You want to rank "red bottles" over "bottle caps" because you know semantic information about these names -- you know that "red bottles" means the thing it's talking about is a "bottle", and "bottle caps" means the thing it's talking about is something else (related to bottles, but not actually a bottle). If you want ranking from Elasticsearch to take this information into account, you have to index the information (maybe add a keyword tag field, one with "bottle" and one with "bottle caps" -- you will have to experiment to see what works with your use case). Of course this means that a person has to ad tags for everything.

However, I suspect you can improve the situation some with the unique filter. My guess is that you don't care a lot about term frequency in a single title ("Bottle caps for 500ml bottle" isn't more about bottles because "bottle" appears twice in it -- term frequency makes little sense for titles like this I think). So you could do something like this:

PUT /myindex
{
  "settings": {
    "index": {
      "number_of_shards": 1
    },
    "analysis": {
      "analyzer": {
        "uniq_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "porter_stem",
            "unique"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "uniq_analyzer"
        }
      }
    }
  }
}

PUT /myindex/doc/1
{"name": "Red coloured bottles"}

PUT /myindex/doc/2
{"name": "Bottle caps for 500ml bottle"}

Then if you search bottle, you'll see the scores are identical -- not perfect, but an improvement. In case you want to understand where a score is coming from, you can use explain:

POST /myindex
{
  "explain": true,
  "query": {
    "match": 
      {"name": "bottle"}
  }
}

TechQA.

Elasticsearch relevance- documents with similar names

There are 1 answers

Related Questions in ELASTICSEARCH

Related Questions in SEARCH

Related Questions in SEARCH-ENGINE

Related Questions in RELEVANCE

Popular Questions

Popular Tags

Trending Questions