How to filter a langchain vector database using search_kwargs parameter from the as_retriever function ?
Here is an example of what I would like to do :
# Let´s say I have the following vector database
db = {'3c3bc745': Document(page_content="This is my text A", metadata={'Field_1': 'S', 'Field_2': 'R'}),
'14f84778': Document(page_content="This is my text B", metadata={'Field_1': 'S', 'Field_2': 'V'}),
'bd0022c9-449b': Document(page_content="This is my text C", metadata={'Field_1': 'Z', 'Field_2': 'V'})}
# Filter the vector database
retriever = db.as_retriever(search_kwargs={'filter': dict(Field_1='Z'), 'k': 1})
# Create the conversationnal chain
chain = ConversationalRetrievalChain.from_llm(llm=ChatOpenAI(temperature=0.0,
model_name='gpt-3.5-turbo',
deployment_id="chat"),
retriever=retriever)
chat_history = []
prompt = "Which sentences do you have ?"
# Expect to get only "This is my text C" but I get also get the two other page_content elements
chain({"question": prompt, "chat_history": chat_history})
If you are using Datastax Astra/Cassandra as VectorDB it would be something like:
Full example here: https://github.com/smatiolids/astra-agent-memory/blob/main/Explicando%20Retrieval%20Augmented%20Generation.ipynb