I'm using Elasticsearch term suggester
for spell correction. my index contains huge list of ads. Each ad has subject and body fields. I've found a problematic example for which the suggester is not suggesting correct suggestions.
I have lots of ads whose subject contains word "soffa" and also 5 ads whose subject contain word "sofa". Ideally, when I send "sofa" (wrong spelling) as text to suggester, it should return "soffa" (correct spelling) as suggestions (since soffa is correct spell and most of ads contains "soffa" and only few ads contains "sofa" (wrong spell)).
Here is my suggester query body :
{
"suggest": {
"text": "sofa",
"subjectSuggester": {
"term": {
"field": "subject",
"suggest_mode": "popular",
"min_word_length": 1
}
}
}
}
When I send above query, I get below response :
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"suggest": {
"subjectSuggester": [
{
"text": "sof",
"offset": 0,
"length": 4,
"options": [
{
"text": "soff",
"score": 0.6666666,
"freq": 298
},
{
"text": "sol",
"score": 0.6666666,
"freq": 101
},
{
"text": "saf",
"score": 0.6666666,
"freq": 6
}
]
}
]
}
}
As you see in above response, it returned "soff" but not "soffa" although I have lots of docs whose subject contains "soffa".
I even played with parameters like suggest_mode
and string_distance
but still no luck.
I also used phrase suggester
instead of term suggester
but still same. Here is my phrase suggester query :
{
"suggest": {
"text": "sofa",
"subjectuggester": {
"phrase": {
"field": "subject",
"size": 10,
"gram_size": 3,
"direct_generator": [
{
"field": "subject.trigram",
"suggest_mode": "always",
"min_word_length":1
}
]
}
}
}
}
I somehow think it doesn't work when one character is missing instead of being misspelled. in the "soffa" example, one "f" is missing. while it works fine for misspells e.g it works fine for "vovlo". When I send "vovlo" it gives me "volvo".
Any help would be hugely appreciated.
Try changing the "string_distance".
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html#term-suggester