I am trying to use Langchain with LanceDB as vector database. Here is how I instatiate database:
db = lancedb.connect("./data/lancedb")
table = db.create_table("my_docs", data=[
{"vector": embeddings.embed_query(chunks[0].page_content), "text": chunks[0].page_content, "id": "1", "file":"bb"}
], mode="overwrite")
I then load more documents with different file metadata:
vectordb = LanceDB.from_documents(chunks[1:], embeddings, connection=table)
Then another batch with also a different file metadata value
vectordb = LanceDB.from_documents(chunks_ma, embeddings, connection=table)
I can see they were loaded succesfully and my vector db has correct amount of docuemnts:
print(len(db['my_docs']))
11
Now I want to create a retriever that will be able to pre filter the data based on file value:
I tried this
retriever = vectordb.as_retriever(search_kwargs={"k": 6, 'filter':{'file': 'bb'}})
retrieved_docs = retriever.invoke("My query regarding something")
But when I check the outputs of the query invocation its still giving me the documents with wrong file metadata values:
print(retrieved_docs[0].metadata['file'])
'cc'
But it was supposed to only query the docuemnts in the database matchin the file value.
Is there something I am doing wrong, or what is the correct approach to filter the values before running retrieval query from LanceDB vector DB using Langchain api?