Retrieve a known single index item vector similarity score in Redis Stack

489 views Asked by At

Say I have two indexes with the following schema.

SCHEMA = [
  TextField("id"),
  VectorField("embedding", "FLAT", {"TYPE": "FLOAT32", "DIM": 768, "DISTANCE_METRIC": "COSINE"}),
]

I have two known ids from each index. Is it possible to retrieve the vector similarity score between those two objects from Redis? Here is some example python code of how the docs approximately describe querying.

query = ???
query_vector = redis.hget(key="embedding", name="products:23")
results = redis.ft("services").search(query, query_params={"vector": query_vector})
1

There are 1 answers

2
A. Guy On BEST ANSWER

If you know what two documents you want to get the distance between, the best way I can think of for getting it would be

  1. Get the embedding of one of the docs.
  2. Perform a hybrid query that will pre-filter the second doc only

To achieve the second step, you can try having some field (tag or numeric for example) with a unique value for each document (like the doc name itself) and look for it before performing the KNN query.

You can also try using the INKEYS query parameter to limit the search to the second document only. From the documentation:

INKEYS {num} {attribute} ... limits the result to a given set of keys specified in the list. The first argument must be the length of the list and greater than zero. Non-existent keys are ignored unless all the keys are non-existent.

Finally, hybrid queries in redisearch are using heuristics to decide in which way to perform the query. If you’re sure the pre-filter passes a single doc, you can specify the policy for the query instead of using the heuristic to make sure it will use AD-HOC BROUT FORCE, which will take the docs that passed the filter and directly compare them to the query, instead of the general flat way (as you have a FLAT index in your example) of multiplying the query against the entire dataset. The heuristic probably choose this way automatically, but you can force the ad-hoc by adding the HYBRID_POLICY parameter to the query:

<filter here> =>[KNN 1 @emmbedding $BLOB HYBRID_POLICY ADHOC_BF]

And one last note: you might want to consider, in the case when you are only looking for the distance between two vectors, simply HGET the two embeddings and perform the multiplication locally using bumpy or any other library.

Hope that helps!