I am trying to fuzzy seach on a title text, but solr does not give me any results on a misspelled brand name "hilfinger" query:

http://rex:8983/solr/project/select?fq=white_label_id%3A6&q=title%3Ahilfinger~
{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"title:hilfinger~",
      "fq":"white_label_id:6",
      "_":"1554887612686"}},
  "response":{"numFound":0,"start":0,"docs":[]
  }}

Using standard search I will get results on correct naming hilfiger:

http://rex:8983/solr/project/select?fq=white_label_id%3A6&q=title%3Ahilfiger
{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"title:hilfiger",
      "fq":"white_label_id:6",
      "_":"1554887612686"}},
  "response":{"numFound":27,"start":0,"docs":[
      {

Is there something I need to activate in the configuration of solr to enable fuzzy search, or what is the reason for getting 0 results?

1 Answers

1
Kusal Hettiarachchi On

Yes! you can. You just have to properly configure the fields in the schema.xml you wish to enable fuzzy search or partial matches on. You can add filters to the desired field during index time to tell Solr to store the ngrams of each value as well in addition to the original value. Later on, fuzzy search can be performed on this field. Two types of such filters exist by default and you only have to attach them to your index analyzer chain of filters.

In both cases, you will have to define a minimum and a maximum size of the ngrams to generate during index time. (Take note that this will increase the size of your index as well.) Let's define your title field in the schema.xml with a filter.

<fieldType name="title" class="solr.TextField" positionIncrementGap="100">
   <analyzer type="index">
      <tokenizer class="solr.LowerCaseTokenizerFactory"/>
      <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"/>
   </analyzer>
   <analyzer type="query">
      <tokenizer class="solr.LowerCaseTokenizerFactory"/>
   </analyzer>
</fieldType>

Here the configuration defines minimum size of the ngrams to be 2 letters and maximum to be 15. You can alter the filter to enable partial matching from anywhere of the field value by replacing the line,

<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"/>

with,

<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>