This can be a duplicate question, but couldn't find something relevant for this:
I have implemented a solr suggester for list of cities and areas. I have user FuzzyLookupFactory for this. My schema looks like this:
<fieldType name="suggestTypeLc" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
synonym.txt is used for mapping older city names with new ones, like Madras=>Chennai, Saigon=>Ho Chi Minh city
My suggester definition looks like this:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggestions</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">searchfield</str>
<str name="weightField">searchscore</str>
<str name="suggestAnalyzerFieldType">suggestTypeLc</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
<str name="storeDir">autosuggest_dict</str>
</lst>
</searchComponent>
My request handler looks like this:
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">suggestions</str>
<str name="suggest.dictionary">results</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Now the problem is that suggester is showing the exact matches first But it is case sensitive. for eg,
/suggest?suggest.q=mumbai (starting with a lower case "m")
will give, exact result at 4th place:
{
"responseHeader":{
"status":0,
"QTime":19},
"suggest":{
"suggestions":{
"mumbai":{
"numFound":10,
"suggestions":[{
"term":"Mumbai Domestic Airport",
"weight":11536},
{
"term":"Mumbai Chhatrapati Shivaji Intl Airport",
"weight":11376},
{
"term":"Mumbai Pune Highway",
"weight":2850},
{
"term":"Mumbai",
"weight":2248},
.....
Whereas, calling /suggest?suggest.q=Mumbai (starting with an upper case "M")
is giving exact result at 1st place:
{
"responseHeader":{
"status":0,
"QTime":16},
"suggest":{
"suggestions":{
"Mumbai":{
"numFound":10,
"suggestions":[{
"term":"Mumbai",
"weight":2248},
{
"term":"Mumbai Domestic Airport",
"weight":11536},
{
"term":"Mumbai Chhatrapati Shivaji Intl Airport",
"weight":11376},
{
"term":"Mumbai Pune Highway",
"weight":2850},
...
What am I missing here ? What can be done to make Mumbai as the first result even if it is called from a lower case "mumbai" as query. I thought the case sensitivity is being handled by "suggestTypeLc" field I've generated.
There is a hidden config-parameter for FuzzyLookupFactory is
exactMatchFirst
which is descibed as:According to your config suggestions are ranked by
searchscore
field (in your config it refers to:<str name="weightField">searchscore</str>
). This is why you when you query asmumbai
all suggestions are sorted by weights.But according to
exactMatchFirst=true
you will haveMumbai
on top (for the query=Mumbai
) despite provided weighting mechanisms. And this is actually howexactMatchFirst
impacts the ordering.Unfortunately I didn't find option for tuning your suggester rather than getting rid of
weightField
at all.Try turning off weighting-by-fields or alternatively try another lookup implementation, for instance AnalyzingInfixLookupFactory.