I am trying to use the NGramFilterFactory in Solr (using Sunspot in Rails) to find similar titles. I managed to add a new field to my solr schema.xml like follows:
<fieldType name="text_ngrm" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="4"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
As I am using sunpsot in a rails app and therefore added the new field through a dynamic field to sunspot. This all worked and I can now search my model using the NGramFilterFactory. What I am not sure about is, how to configure solr in order to search for similar titles. Here are my concrete questions:
- Does it make sense to use the dismax query parser when I am trying to query similar titles?
- How can the (Minimum 'Should' Match) parameter help me to find similar titles?
- Based on what exactly would I choose the ngram min. and max. sizes?
Thanks for any feedback.
There's several things you could do: