Custom analyzer doesn't work when searching Elasticsearch

906 views Asked by At

I create documents index with type _doc. Then I Setup a custom analyzer as follow

POST /documents/_close
PUT /documents/_settings
{
    "settings": {
        "analysis": {
            "analyzer": {
                "custom_analyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "word_delimiter_graph"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "question": {
                "type": "text",
                "analyzer": "custom_analyzer"
            },
            "question_group": {
                "type": "text",
                "analyzer": "custom_analyzer"
            }
        }
    }
}
POST /documents/_open

I try to use this custom_analyzer then it works

POST http://localhost:9200/documents/_analyze
{
  "analyzer": "custom_analyzer",
  "text": "FIRE_DETECTED"
}
# And the result (lowercase and remove _ )
{
    "tokens": [
        {
            "token": "fire",
            "start_offset": 0,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "detected",
            "start_offset": 5,
            "end_offset": 13,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

But when I try searching "fire" or "fire detected", it doesn't work.

When I try searching "fire_detected", it still works (I indexed "FIRE_DETECTED")

#This POST found nothing
POST /documents/_search
{
    "query": {
        "multi_match": {
            "query": "fire detected",
            "fields": [
                "question^2",
                "question_group"
            ]
        }
    }
}

Solution

Try to create a new index with new setting (above)

PUT /documents5
{
    "settings": {...}
}

Index data

PUT http://localhost:9200/documents5/_doc/1
{
  "question": "fire_detected"
}

Search

1

There are 1 answers

8
Amit On

This happened as you just added the definition of your custom_analyzer to your index, but didn't reindex the data(documents in your index), hence new tokens are not present in the inverted index. In order to fix the issue, just reindex again the documents which you want to come in your search result.

You are using the multi_match query which internally uses the match query and these queries are analyzed so you don't need the search time analyzer.

match queries uses the same analyzer which is defined on the field to create the search tokens(ie which is created from the search terms).

From the match query official docs

Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching.