I'm using lucene.net and the snowball analyzer in a asp.net application.
With a specific language I'm using I have the following issue: For two specific words with different meanings after they are stemmed the result is the same, therefore a search for any of them will produce results for both things.
How can I teach the analyzer either not to stem this two words or to, although stemming them, know that they have different meanings.
With Lucene 4.0,
EnglishAnalyzer
now has this ability, since it has a constructor which takes astemExclusionSet
Of course, Lucene.Net isn't up to Lucene 4 yet, so fat lot of good that does.
However, EnglishAnalyzer does this by using a
KeywordMarkerFilter
. So you can create your own Analyzer, overriding the tokenStream method, and adding into the chain aKeywordMarkerFilter
just before theSnowballFilter
.Something like:
You'll need to construct your own
stemExclusionSet
(see CharArraySet).