Path for ChromaDb persistent client

1.3k views Asked by At

I want to setup a ChromaDB to store the embedded text. I'm referencing the following screenshot from an article to setup the ChromaDB with persist_directory:

enter image description here

I'm quite confuse on what is the path that I should use? Currently I'm using databricks notebook for my script, so I'm thinking to store the embedded text in the DBFS (Databricks File System). Hence, I used the following code:

chroma_client = chromadb.PersistentClient(path="/dbfs/ChromaDB")

When I run the code, I got error:

OperationalError: disk I/O error

I'm not sure why I got the error because I have all the read and write access, and the memory is enough as well because the embedded text is just few lines of text converted to vector(for testing). So I'm not sure is it I can't use DBFS as persistent storage for ChromaDB? Or the path that I used is wrong? If DBFS can't be used, then what is the available options?

I tried to find documentation or blogs for some guidance but seems like there isn't any. Any help or advise will be greatly appreciated!

This is the article that I'm looking at:

https://pub.towardsai.net/harness-the-power-of-vector-databases-influencing-language-models-with-personalized-information-ab2f995f09ba

1

There are 1 answers

0
Nathaniel Joselson On BEST ANSWER

As is talked about in this link to another question, the databricks file system (dbfs) is distributed storage and so SQLite can't get the type of locks that it wants to to be able to persist the data to databricks file storage.

They mention in this answer that you can specify your path differently so that sqlite will accept the persistence path.

So instead of:

chroma_client = chromadb.PersistentClient(path="/dbfs/ChromaDB")

You can do:

chroma_client = chromadb.PersistentClient(path="dbfs:/ChromaDB")

However, depending on where the file you are trying to save to, the databricks file system sometimes interprets these paths as memory which is on the compute cluster. Look at the Databricks file system documentation for more information about the different paths.