How to retrive document from PyMilvus given ids?

127 views Asked by At

using

connections.connect("default", host=cfg.db.connection.host, port=cfg.db.connection.port)
collection = Collection(name="ibss")

I can connect to Milvus database and select the collection.

using

query_vector = SentenceTransformerEmbeddings().embed_query(SOME_TEXT_HERE)
search_params = {
    "metric_type": "L2",
    "params": {"nprobe": 10},
}

results = collection.search(data=[query_vector], param=search_params, anns_field = "vector", limit = 3, output_fields = [])

# the question is how to get the text out of ID 
# that can be used to remove
for hits in results:
    for hit in hits: 
        print(hit)
        break

I can get some output like

id: 445499747977925765, distance: 0.24354277551174164, entity: {}

it turned out the entity is always empty regardless of the column - so I decided to leave it as [] to include all fields.

now given the id "445499747977925765" I would like to retrieve the document;

so I tried

entity_id = 445499747977925765

# Define the filter condition
filter_expr = f"id == {entity_id}"

# Search with the filter condition
a = collection.search(
    data=[],
    anns_field="",
    param={"filter": filter_expr},
    limit=1
)

but the a empty !

to give you a full picture I am populating the database using

from langchain.vectorstores import Milvus
vector_db = Milvus.from_documents(
    deduplicated_documents,
    SentenceTransformerEmbeddings(),
    connection_args={"host": cfg.db.connection.host, "port": cfg.db.connection.port},
    collection_name=cfg_data.target.collection.name,
    drop_old=cfg_data.target.collection.renew
)

so I appreciate to hear how to retrial the document given ID

1

There are 1 answers

0
alhassanha On

According to milvus documentation, the condition should be passed within the key expr. like this:

a = collection.search(
    data=[],
    anns_field="",
    expr=filter_expr,
    limit=1
)

but as you're not trying to conduct vector search, it's better to use the method query instead of search:

a = collection.query(
    expr=filter_expr,
    limit=1
)