How to get distance score when searching Solr 9 on a DenseVectorField

78 views Asked by At

I've created a solr index (version 9.3.0) of some poems and nursery rhymes. I'm trying to search for related poems and nursery rhymes and want to get back the dot_product distance for each matching document. I cannot find any way to get that information back. Here's the field I added to the solr in the managed-schema file:

<fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="384"
  similarityFunction="dot_product"  knnAlgorithm="hnsw" 
  hnswMaxConnections="16" hnswBeamWidth="50"/>
<field name="bge_small_vector" type="knn_vector" indexed="true" stored="true"/>

Here's the python code that I'm using to query the solr index:

import pysolr
from encoder import Encoder
from sentence_transformers import SentenceTransformer
import pprint

pp = pprint.PrettyPrinter(indent=4, width=100)

solr = pysolr.Solr('http://localhost:8983/solr/docindex')
model = SentenceTransformer('BAAI/bge-small-en-v1.5')
document = '''Three blind mice. Three blind mice.
See how they run. See how they run.
They all ran after the farmer's wife,
Who cut off their tails with a carving knife.
Did you ever see such a sight in your life
As three blind mice?'''
embedding = model.encode(document, normalize_embeddings=True, convert_to_numpy=True)

solr_response=solr.search(
                    q=r'{!knn f=bge_small_vector topK=10}[' + ",".join([f'{a:.12f}' for a in embedding]) + ']',
                    rows=10,
                    start=0,
                    debugQuery="true",
                    wt='json')

for item in solr_response:
   pp.pprint(item)

pp.pprint(solr_response.debug)

The only reference to distance I can find is in the debug response and it is not specific to any document:

{   'QParser': 'KnnQParser',
    'explain': {'': '\n**0.81944466 = within top 10**\n'},
    'parsedquery': 'KnnVectorQuery(KnnVectorQuery:bge_small_vector[-0.02721269,...][10])',
    'parsedquery_toString': 'KnnVectorQuery:bge_small_vector[-0.02721269,...][10]',
    ...
}

Does anyone know how to get solr to return the distance for each document in a DenseVectorField query?

1

There are 1 answers

0
fakecoder On

From the paper at https://opensearch.org/docs/latest/search-plugins/knn/approximate-knn/, it shows the method of converting distances to scores in OpenSearch. I just tested the L2 distance score = 1 / (1 + distance), so distance = (1 / score) - 1. For Euclidean Distance, you may need to take the square root of the result.