Reciprocal rank fusion using PyTorch

84 views Asked by At

I'm dealing with a large scale (10M+) retrieval problem, where I have q queries and D documents. I've computed the top k nearest documents for each query using 4 embedding models. Now I want to rerank these 3 sets of results using reciprocal rank fusion. All the implementations that I could find use for loop and that doesn't seem feasible since sequentially iterating across so many number of queries will take a lot of time.

For clarity, my similarity matrices look like below:

Embed_1: "query_1": {"doc_10": 0.3, "doc_11": 0.37, "doc_94": 0.38, "doc_1": 0.5, ...}
Embed_2: "query_1": {"doc_5": 0.06, "doc_96": 0.09, "doc_10": 0.12, "doc_8": 0.3, ...}
Embed_3: "query_1": {"doc_11": 0.49, "doc_2": 0.82, "doc_37": 0.97, "doc_4": 1.0, ...}

I want top k document IDs reranked using RRF for query_1. I tried using multiprocessing but CPU of a machine is a bottleneck. If this could be parallelised on a GPU (using PyTorch), then the job would get completed much sooner.

Let me know if more clarity on this is needed.

0

There are 0 answers