elasticsearch ignore accents on search

1k views Asked by At

I have an elasticsearch index with customer informations

I have some issues looking for some results with accents

for example, I have {name: 'anais'} and {name: anaïs}

Running

GET /my-index/_search
{
  "size": 25, 
  "query": {
    "match": {"name": "anaïs"}
  }
}

I would like to get both same for this query, in this case I only have anaïs

GET /my-index/_search
{
  "size": 25, 
  "query": {
    "match": {"name": "anais"}
  }
}

I would like to get anais and anaïs, in this case I only have anais

I tried adding an analyser

PUT /my-new-celebrity/_settings
{
  "analysis": {
    "analyzer": {
      "default": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "asciifolding"
        ]
      }
    }
  }
}

But in this case for both search I only get anais

1

There are 1 answers

1
Amit On BEST ANSWER

Looks like you forgot to apply your custom default analyzer on your name field, below is working example:

Index def with mapping and setting

{
    "settings": {
        "analysis": {
            "analyzer": {
                "default": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "asciifolding"
                    ]
                }
            }
        }
    },
    "mappings" : {
        "properties" :{
            "name" : {
                "type" : "text",
                "analyzer" : "default" // note this 
            }
        }
    }
}

Index sample docs

{
   "name" : "anais"
}

{
   "name" : "anaïs"
}

Search query same as yours

{
    "size": 25,
    "query": {
        "match": {
            "name": "anaïs"
        }
    }
}

And expected both search results

 "hits": [
            {
                "_index": "myindexascii",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.18232156,
                "_source": {
                    "name": "anaïs"
                }
            },
            {
                "_index": "myindexascii",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.18232156,
                "_source": {
                    "name": "anais"
                }
            }
        ]