I was going through some online examples on how to employ large language models for information retrieval and was about to explore open-source solutions for that objective. I found in some tutorials that HuggingFaceInstructEmbeddings are quite good for documents representation and the LLaMA-2 from Meta AI is sufficiently good for text generation. However, when I try to run very simple example I see the error message openai.error.RateLimitError which in my understanding doesn't have anything to do with non-OpenAI solutions. Any ideas what may be causing the error? I will soon acquire a paid subcription for OpenAI API anyway but it would be extremely useful for me to understand why such an error arises where one doesn't expect it.

P.S. I have to write lines 2 and 3 because for the same unknown reason it expects me to provide the key.

Brief description of my code snippet:

  • I provide a PDF and a question I want to be answered using contents of that document
  • The document is read, split, converted to vector representation using the provided embedding function and stored in the vector store
  • The question is converted into vector representation and most similar documents from the vector store are found
  • Found documents are used to generate answer to the question.
import sys
import os
os.environ['OPENAI_API_KEY'] = 'my_open_api_key'

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.llms import Ollama
from langchain.agents.agent_toolkits import create_vectorstore_agent
from langchain.agents.agent_toolkits import VectorStoreInfo, VectorStoreToolkit
from langchain.embeddings import HuggingFaceInstructEmbeddings

rcts = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
embedding = HuggingFaceInstructEmbeddings(model_name='hkunlp/instructor-xl')

def init_vectorstore(pdf_fn):
    loader = PyPDFLoader(pdf_fn)
    data = loader.load()
    splits = rcts.split_documents(data)
    return Chroma.from_documents(documents=splits, embedding=embedding)

def main(llm, pdf_fn, question):
    vs = init_vectorstore(pdf_fn)
    vi = VectorStoreInfo(name='vsi', description='simple example', vectorstore=vs)
    toolkit = VectorStoreToolkit(vectorstore_info=vi)
    agent_executor = create_vectorstore_agent(llm=llm, toolkit=toolkit, verbose=False)

if __name__ == '__main__':
    llm = Ollama(model='llama2', temperature=1e-10)
    pdf_fn, question = sys.argv[1:3]
    main(llm, pdf_fn, question)

I won't paste the complete error trace as it's too big but eventually it looks like this:

Traceback (most recent call last):
  File "/Users/rsuleimanov/Documents/llm_deeds/cookbook/", line 38, in <module>
    main(llm, pdf_fn, question)
  File "/Users/rsuleimanov/Documents/llm_deeds/cookbook/", line 29, in main

  File "/Users/rsuleimanov/Documents/llm_deeds/langchainenv/lib/python3.9/site-packages/openai/", line 687, in _interpret_response_line
    raise self.handle_error_response(
openai.error.RateLimitError: You exceeded your current quota, please check your plan and billing details.

