Chroma.from_texts() 'numpy.ndarray' object has no attribute 'embed_documents' Error

49 views Asked by At

I’m currently working on a project where I’m using the SentenceTransformer model from the sentence-transformers library to generate embeddings for text data. I would like to store these pre-generated embeddings in Chroma for later use.

Here’s a simplified version of my code:

from PyPDF2 import PdfReader
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import CharacterTextSplitter
from sentence_transformers import SentenceTransformer

def generate_embeddings() -> Chroma:
    pdf = PdfReader("path_to_my_pdf")
    raw_text = ''
    for i, page in enumerate(pdf.pages):
        content = page.extract_text()
        if content:
            raw_text += content
    text_splitter = CharacterTextSplitter(
        separator = "\n",
        chunk_size = 750,
        chunk_overlap  = 50,
        length_function = len,
    )
    texts = text_splitter.split_text(raw_text)
    model = SentenceTransformer('all-MiniLM-L6-v2')
    embeddings = model.encode(texts)
    vectordb = Chroma.from_texts(
        texts = texts
        embeddings = embeddings,
        persist_directory = "path_to_persist_directory"
    )
    vectordb.persist()
    return vectordb

However, I’m encountering an issue with the Chroma.from_texts method. The error said "AtributeError: 'numpy.ndarray' object has no attribute 'embed_documents'

Could you please guide me on how to create a Chroma object from pre-generated embeddings? Is there a method or a workaround that I can use to achieve this?

Any help would be greatly appreciated. Thank you in advance!

0

There are 0 answers