Solr FuzzyLookupFactory exactMatch is case sensitive

1.1k views Asked by At

This can be a duplicate question, but couldn't find something relevant for this:

I have implemented a solr suggester for list of cities and areas. I have user FuzzyLookupFactory for this. My schema looks like this:

<fieldType name="suggestTypeLc" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

synonym.txt is used for mapping older city names with new ones, like Madras=>Chennai, Saigon=>Ho Chi Minh city

My suggester definition looks like this:

  <searchComponent name="suggest" class="solr.SuggestComponent">
        <lst name="suggester">
              <str name="name">suggestions</str>
              <str name="lookupImpl">FuzzyLookupFactory</str>
              <str name="dictionaryImpl">DocumentDictionaryFactory</str>
              <str name="field">searchfield</str>
              <str name="weightField">searchscore</str>
              <str name="suggestAnalyzerFieldType">suggestTypeLc</str>
              <str name="buildOnStartup">false</str>
              <str name="buildOnCommit">false</str>
              <str name="storeDir">autosuggest_dict</str>
        </lst>
  </searchComponent>

My request handler looks like this:

  <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
        <lst name="defaults">
                <str name="suggest">true</str>
                <str name="suggest.count">10</str>
                <str name="suggest.dictionary">suggestions</str>
                <str name="suggest.dictionary">results</str>
        </lst>
        <arr name="components">
                <str>suggest</str>
        </arr>
  </requestHandler>

Now the problem is that suggester is showing the exact matches first But it is case sensitive. for eg,

/suggest?suggest.q=mumbai (starting with a lower case "m")

will give, exact result at 4th place:

{
  "responseHeader":{
    "status":0,
    "QTime":19},
  "suggest":{
    "suggestions":{
      "mumbai":{
        "numFound":10,
        "suggestions":[{
            "term":"Mumbai Domestic Airport",
            "weight":11536},
          {
            "term":"Mumbai Chhatrapati Shivaji Intl Airport",
            "weight":11376},
          {
            "term":"Mumbai Pune Highway",
            "weight":2850},
          {
            "term":"Mumbai",
            "weight":2248},
.....

Whereas, calling /suggest?suggest.q=Mumbai (starting with an upper case "M")

is giving exact result at 1st place:

{
  "responseHeader":{
    "status":0,
    "QTime":16},
  "suggest":{
    "suggestions":{
      "Mumbai":{
        "numFound":10,
        "suggestions":[{
            "term":"Mumbai",
            "weight":2248},
          {
            "term":"Mumbai Domestic Airport",
            "weight":11536},
          {
            "term":"Mumbai Chhatrapati Shivaji Intl Airport",
            "weight":11376},
          {
            "term":"Mumbai Pune Highway",
            "weight":2850},
...

What am I missing here ? What can be done to make Mumbai as the first result even if it is called from a lower case "mumbai" as query. I thought the case sensitivity is being handled by "suggestTypeLc" field I've generated.

1

There are 1 answers

1
Pavel Vasilev On

There is a hidden config-parameter for FuzzyLookupFactory is exactMatchFirst which is descibed as:

If true, the default, exact suggestions are returned first, even if they are prefixes or other strings in the FST have larger weights.

According to your config suggestions are ranked by searchscore field (in your config it refers to: <str name="weightField">searchscore</str>). This is why you when you query as mumbai all suggestions are sorted by weights.

But according to exactMatchFirst=true you will have Mumbai on top (for the query=Mumbai) despite provided weighting mechanisms. And this is actually how exactMatchFirst impacts the ordering.

Unfortunately I didn't find option for tuning your suggester rather than getting rid of weightField at all.

Try turning off weighting-by-fields or alternatively try another lookup implementation, for instance AnalyzingInfixLookupFactory.