I'm working with Hibernate Search for months now, but still I'm not able to digest the relevance it brings. I'm overall satisfied with the results it returns, but even simplest test does not satisfy my expectation.
First test was using the term frequency(tf). Data:
- word
- word word
- word word word
- word word word word
- word word word word word
- word word word word word word
Results I get:
- word
- word word word word
- word word word word word
- word word word word word word
- word word
- word word word
I'm really confused with this scoring effect. My Query is quite complex, but as this test did not have any other field involved, it can be simplified as below: booleanjunction.should(phraseQuery).should(keywordQuery).should(fuzzyQuery)
I've analyzers as below:
StandardFilterFactory
LowerCaseFilterFactory
StopFilterFactory
SnowballPorterFilterFactory for english
My Explanation object https://jsfiddle.net/o51kh3og/
Scoring calculation is something really complex. Here, you have to begin with the primal equation:
As you said, you have
tf
which means term frequency and its value is the squareroot of the frequency of the term.But here, as you can see in your explanation, you also have
norm
(akafieldNorm
) which is used infieldWeight
calculation. Let's take your example:Here,
eklavya
has a better score than the other becausefieldWeight
is the product oftf
,idf
andfieldNorm
. This last one is higher foreklavya
document because he only contains one term.As above documentation said:
The more terms you have in a field, lower
fieldNorm
will be. Be careful with the value of this field.So, to conclude, here you have a perfect mix to understand that the score is not calculated only with the frequency but also with the number of term that you have in your field.