improve lucene.net analyzer

Question

improve lucene.net analyzer

206 views Asked by Gnomo At 17 February 2014 at 12:06

I'm using lucene.net and the snowball analyzer in a asp.net application.

With a specific language I'm using I have the following issue: For two specific words with different meanings after they are stemmed the result is the same, therefore a search for any of them will produce results for both things.

How can I teach the analyzer either not to stem this two words or to, although stemming them, know that they have different meanings.

Original Q&A

There are 2 answers

**femtoRgon** · Answer 1 · 2014-02-19T00:09:38+00:00

With Lucene 4.0, EnglishAnalyzer now has this ability, since it has a constructor which takes a stemExclusionSet

Of course, Lucene.Net isn't up to Lucene 4 yet, so fat lot of good that does.

However, EnglishAnalyzer does this by using a KeywordMarkerFilter. So you can create your own Analyzer, overriding the tokenStream method, and adding into the chain a KeywordMarkerFilter just before the SnowballFilter.

Something like:

public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new StandardTokenizer(reader);
    result = new StandardFilter(result);
    result = new LowerCaseFilter(result);
    if (stopSet != null)
        result = new StopFilter(result, stopSet);
    result = new KeywordMarkerFilter(result, stemExclusionSet);
    result = new SnowballFilter(result, name);
    return result;
}

You'll need to construct your own stemExclusionSet (see CharArraySet).

**Lord Darth Vader** · Answer 2 · 2014-02-17T13:28:48+00:00

Lord Darth Vader On 17 February 2014 at 13:28

I am working from memory here but as I recall in one of the constructors you can pass an array of stopwords, which will stop the passed in words from being stemmed.

TechQA.

improve lucene.net analyzer

There are 2 answers

Related Questions in LUCENE

Related Questions in LUCENE.NET

Related Questions in STEMMING

Related Questions in SNOWBALLANALYZER

Popular Questions

Popular Tags

Trending Questions