I have used Metaphone and soundex Encoder with "Phonetic Token Filter" in Elasticsearch.
Metaphone is good for English words.
Soundex is good for English as well as Hindi maybe many other languages as well.
I want to know which of these encoders is best optimized for Hindi and if possible other Indian languages?
- Soundex
- Metaphone
- double_metaphone
- refined_soundex
- caverphone1 - English (New Zealand localised)
- caverphone2 - English (New Zealand localised)
- cologne - German
- nysiis - Improvized Soundex
- koelnerphonetik - German
- haasephonetik - German
- beider_morse - English and multiple European Language
- daitch_mokotoff - Slavic & Yiddish Surname
As This is not listed on Elasticsearch website for which Language we should choose which Encoder.
Also tell me which of the Encoders have you already used and for which language.
Phonetic encoders are alogorithms for indexing words by their pronunciation.
Explanation for this is available on wikipedia
References: Details of above algorithms and their subtypes us available in below wikipedia page 1. https://en.wikipedia.org/wiki/Phonetic_algorithm
Among above SoundEx is most suitable for Indian languages You can check below resources for same 1. Phonetic search for Indian languages 2. https://thottingal.in/blog/2009/07/26/indicsoundex/