I'm not sure to understand how vector space model is used in lucene scoring.
I read here (https://www.elastic.co/guide/en/elasticsearch/guide/current/practical-scoring-function.html) that lucene scores a document as the sum of the tf-idf of each term query (if we omit coordination factor, field length and boosts). I don't understand how vector space model is used.
Space vector model could be used to calculate the similarity between the tf-idf vector of a document and the tf-idf vector of the query. This should give us a CosSimilarity score between the query and a document. The score would be between 0 and 1, so different requests should be easy to compare.
Why not using lucene score ?
Lucene uses the 'practical score function' mentioned in your link, which is an approximation of the cosine similarity - extended to support 'practical' features such as boosts.
If you take the vector space cosine similarity formula for a query q and a document d, you have:
Considering that q and d are vectors like
[tf(t1) * idf(t1), ...]
, and that in the q vector tf(t) is either 1 or 0, the formula becomes:You can further replace
||q||
with1 / queryNorm(q)
given their definitionqueryNorm = 1 / √sumOfSquaredWeights
which is close to the formula they give in the docs:
||d||
, the norm of the document vector, however, does not have a direct equivalent in the terms of their formula.