Is there way to boost original term more while using Solr synonyms?

2.1k views Asked by At

For example I have synonyms laptop,netbook,notebook in index_synonyms.txt

When user search for netbook I want to boost original text more then expanded by synonyms? Is there way to specify this in SynonymFilterFactory? For example use original term twice so his TF will be bigger

1

There are 1 answers

4
Siddhartha Reddy On

As far as I know, there is no way to do this with the existing SynonymFilterFactory. But following is a trick you can use to get this behavior.

Let's say your field is called title. Create another field which is a copy of this, say title_synonyms. Now ensure that SynonymFilterFactory is used as an analyzer only for title_synonyms (you can do this by using different field types for the two fields — say text and text_synonyms). Search in both these fields but give higher boost to title than title_synonyms.

Here are sample field type definitions:

    <fieldType name="text" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
    </fieldType>

    <fieldType name="text_synonyms" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms_index.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms_query.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
    </fieldType>

And here are sample field definitions:

    <field name="title" type="text" stored="false"
           required="true" multiValued="true"/>
    <field name="title_synonyms" type="text_synonyms" stored="false"
           required="true" multiValued="true"/>

Copy title field to title_synonyms:

<copyField source="title" dest="title_synonyms"/>

If you are using dismax, you can give different boosts to these fields like so:

    <str name="qf">title^10 title_synonyms^1</str>