opensearchserver tokenizer for permutation of all words in query

443 views Asked by At

I need to configure Open-search server to analyse the query in such a way that any permutation of words in the query are matched, it return the document.

For example, In indexation of a field I have a phrase "knee pain". Now if my query is like "how to remove pain in human knee". I want that this query output the document having "knee pain" in indexation field.

Hence my requirement to break the query string as "remove","pain","human","knee","remove pain",""remove knee","remove human","pain knee","human knee","knee pain","human pain",etc.

So that it matches "knee pain". Is there any tokenizer or filter which can help me to achieve this.

1

There are 1 answers

3
Fix It Scotty On

Select your index, click on the Schema tab, and then click the Analyzers tab.

I normally edit the TextAnalyzer and add additional filters to it. I normally start with the lower case and stop filter to make searches case-insensitive and remove stop words like "a", "an", "the".

Then, the Shingle filter will give you the n-grams to make phrase matches. Shingle filter with a shingle size of 3-4 four words usually works. Shingling is creating overlapping permutations of word phrases from the analyzed text. "The brown fox jumps high" with a shingle size of 3 would create analyzed n-grams of 1,2, and 3 words. IE, 1-word: "the", "brown", "fox", "jumps", "high". 2-word: "the brown", "brown fox", "fox jumps", "jumps high", etc.

enter image description here