I'm trying to index some old documents for searching -- 16th, 17th, 18th century.
Modern stemmers don't seem to handle the antiquated word endings: worketh, liveth, walketh.
Are there stemmers that specialize in the English from the time of Shakespeare and the King James Bible? I'm currently using solr.PorterStemFilterFactory
.
It looks like the rule changes are minimal for that.
So, it might be possible to copy/modify the PorterStemmer class and related Factories/Filters.
Or it might be possible to add those specific rules as Regular expression filter before Porter.