How to query LanceDB vector DB using Langchain api with filters?

205 views Asked by At

I am trying to use Langchain with LanceDB as vector database. Here is how I instatiate database:

db = lancedb.connect("./data/lancedb")

table = db.create_table("my_docs", data=[
    {"vector": embeddings.embed_query(chunks[0].page_content), "text": chunks[0].page_content, "id": "1", "file":"bb"}
], mode="overwrite")

I then load more documents with different file metadata:

vectordb = LanceDB.from_documents(chunks[1:], embeddings, connection=table)

Then another batch with also a different file metadata value

vectordb = LanceDB.from_documents(chunks_ma, embeddings, connection=table)

I can see they were loaded succesfully and my vector db has correct amount of docuemnts:

print(len(db['my_docs']))

11

Now I want to create a retriever that will be able to pre filter the data based on file value:

I tried this

retriever = vectordb.as_retriever(search_kwargs={"k": 6, 'filter':{'file': 'bb'}})
retrieved_docs = retriever.invoke("My query regarding something")

But when I check the outputs of the query invocation its still giving me the documents with wrong file metadata values:

print(retrieved_docs[0].metadata['file'])

'cc'

But it was supposed to only query the docuemnts in the database matchin the file value.

Is there something I am doing wrong, or what is the correct approach to filter the values before running retrieval query from LanceDB vector DB using Langchain api?

0

There are 0 answers