I'm trying to do an efficient auto-complete search input on my website, to search cities. I assume that people will always start to search their city name, with the right order of words.
E.g. a user who live in Saint-Maur will type sai.. but will never type mau.. in first place.
I need to improve the score of results, if the result starts with the term from the query. E.g. if a user type pari, the city Parigné-le-Pôlin should have a better score than Fontenay-en-Parisis, since it starts with pari.
I'm using an edge-gram filter, and a phrase match because the order of words matters. I'm sure that my problem has a simple solution, but I'm a newb in the ES magic world :)
Here is my mapping:
{
"settings": {
"index": {
"number_of_shards": 1
},
"analysis": {
"analyzer": {
"partialPostalCodeAnalyzer": {
"tokenizer": "standard",
"filter": ["partialFilter"]
},
"partialNameAnalyzer": {
"tokenizer": "standard",
"filter": ["asciifolding", "lowercase", "word_delimiter", "partialFilter"]
},
"searchAnalyzer": {
"tokenizer": "standard",
"filter": ["asciifolding", "lowercase", "word_delimiter"]
}
},
"filter": {
"partialFilter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 50
}
}
}
},
"mappings": {
"village": {
"properties": {
"postalCode": {
"type": "string",
"index_analyzer": "partialPostalCodeAnalyzer",
"search_analyzer": "searchAnalyzer"
},
"name": {
"type": "string",
"index_analyzer": "partialNameAnalyzer",
"search_analyzer": "searchAnalyzer"
},
"population": {
"type": "integer",
"index": "not_analyzed"
}
}
}
}
}
Some sample:
PUT /tv_village/village/1 {"name": "Paris"}
PUT /tv_village/village/2 {"name": "Parigny"}
PUT /tv_village/village/3 {"name": "Fontenay-en-Parisis"}
PUT /tv_village/village/4 {"name": "Parigné-le-Pôlin"}
If I perform this query, you can see that results are not in the order I want them to be (I want the 4th result to be before the 3d one):
GET /tv_village/village/_search
{
"query": {
"match_phrase": {
"name": "pari"
}
}
}
Results:
"hits": [
{
"_index": "tv_village",
"_type": "village",
"_id": "1",
"_score": 0.7768564,
"_source": {
"name": "Paris"
}
},
{
"_index": "tv_village",
"_type": "village",
"_id": "2",
"_score": 0.7768564,
"_source": {
"name": "Parigny"
}
},
{
"_index": "tv_village",
"_type": "village",
"_id": "3",
"_score": 0.3884282,
"_source": {
"name": "Fontenay-en-Parisis"
}
},
{
"_index": "tv_village",
"_type": "village",
"_id": "4",
"_score": 0.3884282,
"_source": {
"name": "Parigné-le-Pôlin"
}
}
]
In your mapping definition, put another analyzer:
meaning, keep the word intact (through
keywordanalyzer) and lowercase it (like "parigné-le-pôlin"). Then define for yournamefield another two fields:rawthat should benot_analyzedone
raw_lowercasethat should usekeywordLowercaseAnalyerI'm doing this because you can have searches for "pari" or "Pari". In your query, use the
rescorefunctionality to recompute the scoring based on an additional query:There are two drawbacks, from your use case point of view and regarding
prefixquery:prefixisnot_analyzedand this is the reason for adding those tworaw*fields: one field deals with a lowercase version, the other deals with the untouched version so that queries for "pari" or "Pari" cover these scenarios.I have two suggestions:
window_sizeattribute forrescorequery to limit the number of values the rescoring is performed on, thus improving the performance.For your reference, this is the documentation page for
rescore.