Solr5 search not displaying results based on score

88 views Asked by At

I am implementing Solr search, the search order is not displaying on the basis of score. Lets say if use the search keywords as .net ios it's returning the results based on score. I have a field title which holds the following data

KeySkills:Android, ios, Phonegap, ios
KeySkills:.net, .net, .net, MVC, HTML, CSS

Here when i search .net ios as search keyword net, .net, .net, MVC, HTML, CSS should come first in the results and the score should be higher because it contains .net 3 times, but i am getting reverse result.

Is there any setting needs to be done in solr config file or in schema.xml file to achieve this or how can i sort the results based on max no of occurrence of the the search string. please help me to solve this.

Following is the result i get

{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
 "indent": "true",
 "q": ".net ios",
 "_": "1434345788751",
 "wt": "json"
 }
 },
 "response": {
 "numFound": 2,
 "start": 0,
     "docs": [
    {

    "KeySkills": "Android, ios, Phonegap, ios",
    "_version_": 1504020323727573000,
    "score": 0.47567564

   },
   {

    "KeySkills": "net, net, net, MVC, HTML, CSS",
    "_version_": 1504020323675144200,
    "score": 0.4726259
  }
]
}
}
1

There are 1 answers

2
alexf On

As you can see in Lucene's doc, score is not only estimated with the number of matching term:

score(q,d) = coord(q,d) · queryNorm(q) · ∑( tf(t in d)· idf(t)²·t.getBoost()·norm(t,d) )

Where tf(t in d) correlates to the term's frequency, defined as the number of times term t appears in the currently scored document d.

idf(t) stands for Inverse Document Frequency. This value correlates to the inverse of docFreq (the number of documents in which the term t appears). This means rarer terms give higher contribution to the total score.

coord(q,d) is a score factor based on how many of the query terms are found in the specified document.

t.getBoost() is a search time boost of term t in the query q as specified in the query text.

norm(t,d) encapsulates a few (indexing time) boost and length factors:

  • Field boost
  • lengthNorm computed when the document is added to the index in accordance with the number of tokens of this field in the document, so that shorter fields contribute more to the score.

When a document is added to the index, all the above factors are multiplied. If the document has multiple fields with the same name, all their boosts are multiplied together:

norm(t,d) = lengthNorm · ∏ f.boost()

So, here I guess that "KeySkills": "Android, ios, Phonegap, ios" is before your other document because it contains less words than the other one.

To check that, you can use this awesome tool, which is explain.solr.pl.