Is there any way how to create new stemmer? There is for example analyzer for czech language already built in with czech language stemmer. This algorithm was made by some guys in Netherlands. It's not that bad, but for the native speaker it is clear that those honorable guys does not speak the language. If I would like to create my own stemming algorithm, how can I do it in the Elasticsearch?
Thanks.
Elasticsearch is based on Lucene, so this answer is about how to add a custom stemmer to Lucene.
This is how I implemented Lucene's Analyzer interface based on a custom stemmer (or lemmatizer, to be more precise):
https://code.google.com/p/hunglish-webapp/source/browse/trunk/src/main/java/hu/mokk/hunglish/lucene/analysis/StemmerAnalyzer.java
See also these two classes: https://code.google.com/p/hunglish-webapp/source/browse/trunk/src/main/java/hu/mokk/hunglish/lucene/analysis/CompoundStemmerTokenFilter.java
https://code.google.com/p/hunglish-webapp/source/browse/trunk/src/main/java/hu/mokk/hunglish/jmorph/LemmatizerWrapper.java
Note, that this is for an older version of Lucene, 3.2/3.3. The same implementation would probably be more simple for new versions. https://code.google.com/p/hunglish-webapp/source/browse/trunk/pom.xml