Load Chroma vectorstore from disk

60 views Asked by At

I have created a vectorstore using Chroma and Langchain with three different collections and stored it in a persistent directory using the following code:

def create_embeddings_vectorstorage(splitted):
    embeddings = HuggingFaceEmbeddings()
    persist_directory = './chroma'
    vectorstores = {}
    for key, value in splitted.items(): #splitted is a dictionary with three keys where the values are a list of lists of Langchain Document class
        collection_name = key.lower()
        for documents in value:
            vectorstore = Chroma.from_documents(documents=documents, embedding=embeddings, persist_directory=persist_directory, collection_name=collection_name, collection_metadata={'animal': collection_name})
            vectorstore.persist() #Persist the database to use it later (I think only for jupyter notebook)
            vectorstores[collection_name] = vectorstore
    return vectorstores

When creating this vectorstore, I have checked that the embeddings have been saved correctly with the following code:

print(vectorstore['bat']._collection.count())

OUTPUT: 1200

Now I want to load the vectorstore from the persistent directory into a new script. This script is stored in the same folder as the vectorstore. I have done this using the following code:

embeddings = HuggingFaceEmbeddings()
persist_directory = './chroma'

bat = Chroma(collection_name='bat', persist_directory=persist_directory, embedding_function=embeddings)

My **problem **is that, when I do this, it does not return any results when I do a search and, when I do the following code, it gives me 0:

print(bat._collection.count())

OUTPUT: 0

I understand that the vectorstore is created well. The problem is when recovering it, I don't know what part I might be doing wrong.

What I want is, after creating a vectorstore with Chroma and saving it in a persistent directory, to load the different collections in a new script. What I get is that, despite loading the vectorstore without problems, it comes empty.

It is well loaded as:

print(bat)

OUTPUT: <langchain.vectorstores.chroma.Chroma object at 0x000001900281AED0> (a vectorstore object from Langchain)

but it is empty. I don't know what I must be doing wrong

Thank you all very much!

0

There are 0 answers