Solr SnowballPorterFilterFactory for index and query analyzers

Question

Solr SnowballPorterFilterFactory for index and query analyzers

7.6k views Asked by ZendMind At 15 May 2012 at 20:40

I use SnowballPorterFilterFactory for index and query analyzers. When i search for "profession" word. Solr successfully finds only articles that contains "profession", but i want "professional" "professionalism" ...

This is the current configuration on schema.xml

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
        <filter class="solr.ASCIIFoldingFilterFactory" />
        <filter class="solr.SnowballPorterFilterFactory" language="French"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.ASCIIFoldingFilterFactory" />
        <filter class="solr.SnowballPorterFilterFactory" language="French"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    </analyzer>
</fieldType>

Original Q&A

There are 1 answers

**harmstyler** · Answer 1 · 2012-05-15T22:28:04+00:00

What is happening is porter is over-stemming your query. When you search for profession your keyword gets stemmed down to profess, whereas profession professional and professionalism are all stored in the index as profession.

The only real way you are going to get around this is by adding another fieldType where you do not stem your query.

Something like:

<fieldType name="text_unstemmed_query" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
        <filter class="solr.ASCIIFoldingFilterFactory" />
        <filter class="solr.SnowballPorterFilterFactory" language="French"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.ASCIIFoldingFilterFactory" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    </analyzer>
</fieldType>

With a copyfield like:

<copyField source="your_text_field" dest="text_unstem_query_field"/>

TechQA.

Solr SnowballPorterFilterFactory for index and query analyzers

There are 1 answers

Related Questions in SEARCH

Related Questions in SOLR

Related Questions in SNOWBALL

Related Questions in SNOWBALLANALYZER

Popular Questions

Popular Tags

Trending Questions