I am reading the official doc at https://www.elastic.co/guide/en/elasticsearch/reference/current/search-as-you-type.html and I do not understant how the search_as_you_type field works.
If have the following setting :
{
"settings": {
"analysis": {
"tokenizer": {
"ngrams": {
"type": "ngram",
"min_gram": 3,
"max_gram": 10
}
},
"analyzer": {
"partial_words" : {
"type": "custom",
"tokenizer": "ngrams",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"my_text": {
"type": "text",
"fields": {
"shingles": {
"type": "search_as_you_type",
"analyzer": "partial_words",
"term_vector": "with_positions_offsets"
},
"ngrams": {
"type": "text",
"analyzer": "partial_words",
"search_analyzer": "standard",
"term_vector": "with_positions_offsets"
}
}
}
}
}
}
I would like to know how the my.text.shingles is tokenized. For instance, the text
"Martin Luther was a german priest"
is analyzed at the index time in the "my_text" field with the analyzer "partial_words" How does it work in the shingles fields ? Which tokens should I have in
1) my_text.shingles
2) my_text.shingles._2gram
3) my_text.shingles._3gram
Thanks for your light !
EDIT : is there any way to be sure (or any query) to know that the _ngram fields are giving those tokens ?
1) my_text.shingles
[Martin, Luther, was, a, german, priest]
2) my_text.shingles._2gram
[Martin Luther, Luther was, was a, a german, german priest]
3) my_text.shingles._3gram
[Martin Luther was, Luther was a, was a german, a german priest]
You can check this article to understand more. Simply it's tokenizing the words like the following image.
You can use _analyze API to see how the text is tokenized.