I'm working on a Spanish search engine. (I don't speak Spanish) But based on my research, the goal is more or less like this: 1. filter stopwords like "dos","de","la"... 2. stem the words for both search and index. e.g If you search "primera", then "primero","primer" should also show up.
My attempt:
es_analyzer={
"settings": {
"analysis": {
"filter": {
"spanish_stop": {
"type": "stop",
"stopwords": "_spanish_"
},
"spanish_stemmer": {
"type": "stemmer",
"language": "spanish"
}
},
"analyzer": {
"default_search": {
"type": "spanish"
},
"rebuilt_spanish": {
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stop",
"spanish_stemmer"
]
}
}
}
}
}
The problem:
When I use "type":"spanish"
in the "default_search"
, my query "primera" gets stemmed to "primer", which is correct, but even though I specified to use "spanish_stemmer"
in the filter, the documents in the index aren't stemmed. So as a result when I search for "primera", it only shows exact matches for "primer". Any suggestions on fixing this?
Potential fix but I haven't figured out the syntax:
- Using built-in
"spanish"
analyzer in filter. What's the syntax? - Adding spanish stemmer and stopwords in
"default_search"
. But I don't know how to use compound settings there.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
Index Data:
Search Query:
Search Result: