Lucene using Snowball and SpellChecker brings back strange values

957 views Asked by At

I am trying to get SpellChecker setup using Lucene.NET, it all works fine other than situations similar to the following:

I have text containing satellite in the index, I analyze it using Snowball.

I then create a SpellChecker index and get suggestions from it. The suggestion I get returned when passing in "Satalite" is "satellit".

I am assuming this is because Snowball is stemming Satellite down to satellit and hence SpellChecker is returning that as the suggestion.

Is there anyway around this so I can use the two together other than creating an additional field for non stemmed words just so the spell checker can check that?

2

There are 2 answers

0
John_ On BEST ANSWER

As Shashikant mentioned above:

You are right, this happens due to stemming. Unfortunately, the stemmed words not meant only for search and outside search they can be meaningless. Even I don't know any other technique than storing it multiple times. That additional field can be configured to store as little information as possible to reduce the burden. – Shashikant Kore Dec 2 at 14:08

0
Haldrich98 On

Have you considered putting the words generated by the snowball filter in as synonyms? That is a direction I'm going... don't know how well it will work, but seems plausible. Then spellchecker will return the right words, but I can still do my searches and find the stemmed variant.