Are there any Lucene stemmers that handle Shakespearean English?

100 views Asked by At

I'm trying to index some old documents for searching -- 16th, 17th, 18th century.

Modern stemmers don't seem to handle the antiquated word endings: worketh, liveth, walketh.

Are there stemmers that specialize in the English from the time of Shakespeare and the King James Bible? I'm currently using solr.PorterStemFilterFactory.

1

There are 1 answers

0
Alexandre Rafalovitch On BEST ANSWER

It looks like the rule changes are minimal for that.

So, it might be possible to copy/modify the PorterStemmer class and related Factories/Filters.

Or it might be possible to add those specific rules as Regular expression filter before Porter.