I'm trying to do an efficient auto-complete search input on my website, to search cities. I assume that people will always start to search their city name, with the right order of words.
E.g. a user who live in Saint-Maur
will type sai..
but will never type mau..
in first place.
I need to improve the score of results, if the result starts with the term from the query. E.g. if a user type pari
, the city Parigné-le-Pôlin
should have a better score than Fontenay-en-Parisis
, since it starts with pari
.
I'm using an edge-gram filter, and a phrase match because the order of words matters. I'm sure that my problem has a simple solution, but I'm a newb in the ES magic world :)
Here is my mapping:
{
"settings": {
"index": {
"number_of_shards": 1
},
"analysis": {
"analyzer": {
"partialPostalCodeAnalyzer": {
"tokenizer": "standard",
"filter": ["partialFilter"]
},
"partialNameAnalyzer": {
"tokenizer": "standard",
"filter": ["asciifolding", "lowercase", "word_delimiter", "partialFilter"]
},
"searchAnalyzer": {
"tokenizer": "standard",
"filter": ["asciifolding", "lowercase", "word_delimiter"]
}
},
"filter": {
"partialFilter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 50
}
}
}
},
"mappings": {
"village": {
"properties": {
"postalCode": {
"type": "string",
"index_analyzer": "partialPostalCodeAnalyzer",
"search_analyzer": "searchAnalyzer"
},
"name": {
"type": "string",
"index_analyzer": "partialNameAnalyzer",
"search_analyzer": "searchAnalyzer"
},
"population": {
"type": "integer",
"index": "not_analyzed"
}
}
}
}
}
Some sample:
PUT /tv_village/village/1 {"name": "Paris"}
PUT /tv_village/village/2 {"name": "Parigny"}
PUT /tv_village/village/3 {"name": "Fontenay-en-Parisis"}
PUT /tv_village/village/4 {"name": "Parigné-le-Pôlin"}
If I perform this query, you can see that results are not in the order I want them to be (I want the 4th result to be before the 3d one):
GET /tv_village/village/_search
{
"query": {
"match_phrase": {
"name": "pari"
}
}
}
Results:
"hits": [
{
"_index": "tv_village",
"_type": "village",
"_id": "1",
"_score": 0.7768564,
"_source": {
"name": "Paris"
}
},
{
"_index": "tv_village",
"_type": "village",
"_id": "2",
"_score": 0.7768564,
"_source": {
"name": "Parigny"
}
},
{
"_index": "tv_village",
"_type": "village",
"_id": "3",
"_score": 0.3884282,
"_source": {
"name": "Fontenay-en-Parisis"
}
},
{
"_index": "tv_village",
"_type": "village",
"_id": "4",
"_score": 0.3884282,
"_source": {
"name": "Parigné-le-Pôlin"
}
}
]
In your mapping definition, put another analyzer:
meaning, keep the word intact (through
keyword
analyzer) and lowercase it (like "parigné-le-pôlin"). Then define for yourname
field another two fields:raw
that should benot_analyzed
one
raw_lowercase
that should usekeywordLowercaseAnalyer
I'm doing this because you can have searches for "pari" or "Pari". In your query, use the
rescore
functionality to recompute the scoring based on an additional query:There are two drawbacks, from your use case point of view and regarding
prefix
query:prefix
isnot_analyzed
and this is the reason for adding those tworaw*
fields: one field deals with a lowercase version, the other deals with the untouched version so that queries for "pari" or "Pari" cover these scenarios.I have two suggestions:
window_size
attribute forrescore
query to limit the number of values the rescoring is performed on, thus improving the performance.For your reference, this is the documentation page for
rescore
.