MongoDB Cosine Similarity Search with Source Filtering

233 views Asked by At

I have a database in Cosmos containing around 1500 JSON documents. Some of these documents have the field 'source' set to 'A', while others have 'source' set to 'B'. To find the four most similar values using cosine similarity, I specifically need to search within the documents of type 'B'. My attempt to use the IDs for this purpose didn't yield the expected results.

I initially filtered the documents where the 'source' field is 'B' and retrieved the corresponding IDs. Subsequently, I utilized these IDs in the $search stage of the MongoDB aggregation pipeline. Along with the IDs, I included the cosine similarity vector, the path to the relevant values, and the parameter for the number of closest matches required. Finally, I executed the aggregation query and stored the resulting response.

filtered_documents = collection.find({'source': 'B'})
filtered_ids = [doc['_id'] for doc in filtered_documents]

query = [
    {
        "$search": {
            "cosmosSearch": {
                "ids": filtered_ids,
                "vector": embedded_vector,
                "path": "values",
                "k": k_closest
            },
            "returnStoredSource": True
        }
    }
]

response = collection.aggregate(query)

I expect it to apply the query only to the values that have the 'source' equal to B, however, it is applying to all values.

0

There are 0 answers