I would like to get an analyzer with the behavior of the standard english analyzer and also a set of words which should be synonyms during search.
This is the definition which I tried:
{
"analysis": {
"filter": {
"synonym_en": {
"type": "synonym",
"synonyms": [
"universe, cosmos",
"women, woman",
"man, men"
]
},
"my_filter": {
"type": "word_delimiter",
"preserve_original": "false",
"split_on_numerics": "false"
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"filter": [
"my_filter"
],
"tokenizer": "keyword"
},
"my_english": {
"type": "english",
"stopwords": [
"a",
"an",
"and",
"are",
"as",
"at",
"be",
"but",
"by",
"for",
"if",
"into",
"is",
"it",
"of",
"on",
"or",
"such",
"that",
"the",
"their",
"then",
"there",
"these",
"they",
"this",
"to",
"was",
"will",
"with"
],
"filter": [
"synonym_en"
]
}
}
}
}
However I could not get it tow work. indeed when I run the example:
GET /my_index/_analyze?analyzer=my_english&text='Men'
It only returns the token men
, while I would like to have both man
and men
.
Please also note that a simpler analyzer
{
"settings": {
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"british,english",
"queen,monarch",
"man,men"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
}
}
Seems to work as it returns man
and men
.
How can I can the desired behavior + stemming from the English analyzer ?
This is because synonym/filter is not an applicable parameter to configure for "english" analyzer. There is a difference between a custom analyzer and a builtin analyzer. Builtin analyzers only allow certain parameters to be configurable . i.e in case of language analyzers it is stopwords ,stem exclusion .So the rest of the parameters in my_english alias for english analyzers are just ignored .Probably the more appropriate behaviour here would be to throw an error.
Custom analyzers on the other hand for a given tokenizer you can add additional token filters and char filters
Anyways if you want to use synonym filter with english analyzer you need to create a custom analyzer that implements an english analyzer as specified here. You can add the synonym filter to this.