cost OpenAI usage in RAG retrievel augmented generation Pipeline (LangChain, FAISS, OpenAI)

Question

cost OpenAI usage in RAG retrievel augmented generation Pipeline (LangChain, FAISS, OpenAI)

277 views Asked by aahh At 23 December 2023 at 01:05

I am not sure how to estimate the complete cost of OpenAI usage in the RAG pipeline I am building. I want to track beforehand the token usage and the associated cost. Here are some code snippets:

model_name = 'text-embedding-ada-002'

embeddings = OpenAIEmbeddings( model=model_name, openai_api_key=openai_api_key )

def create_and_load_faiss_index(chunks, embeddings, index_path):
    try:
        # Create a FAISS index from documents
        db = FAISS.from_documents(chunks, embeddings)

        # Save the FAISS index locally
        db.save_local(index_path)

        # Load the FAISS index from the saved location
        db = FAISS.load_local(index_path, embeddings)

        return db

    except Exception as e:
        print(f"An error occurred: {str(e)}")
        return None

db = create_and_load_faiss_index(chunks, embeddings, index_path)

retriever = db.as_retriever()

template = """…"""

prompt_template = ChatPromptTemplate.from_template(template=template) print(prompt_template)

llm = ChatOpenAI(model_name="gpt-4", temperature=0)

rag_chain = ( {"context": retriever, "question": RunnablePassthrough()} | prompt_template | llm | StrOutputParser() )

query = f"…"

openai_output = rag_chain.invoke(query)

Track OpenAI usage (embeddings AND RAG): To track the usage beforehand, I need to calculate the cost of calling "text-embedding-ada-002" in the embeddings phase and "gpt-4" in the rag phase. For calling gpt-4 in RAG, I came across "get_openai_callback()". What about the embedding phase? Thinking about that gpt-4 will retrieve its information from the faiss index, this does not include any tokens in that sense, correct?

with get_openai_callback() as cb:
    openai_output = rag_chain.invoke(query)
    print(cb)

Which returns something like:

Tokens Used: 37 Prompt Tokens: 4 Completion Tokens: 33 Successful Requests: 1 Total Cost (USD): $7.2e-05

Original Q&A

There are 1 answers

**harufumi.abe** · Answer 1 · 2024-03-10T09:27:16+00:00

in my case, the result was not same.
I couldn't reduce Prompt Tokens(On the contrary, little bit increased prompt tokens and responsetime).
but embedded prompt returned better answer.

Case1

Non embedded prompt

"prompt_tokens": 3295,
"completion_tokens": 347,
"openai_process_time": 4.253575,

Embedded prompt
(answered better)

"prompt_tokens": 3602,
"completion_tokens": 686,
"openai_process_time": 8.553565,

Case2

Non embedded prompt

"prompt_tokens": 3355,
"completion_tokens": 347,
"openai_process_time": 4.67733,

Embedded prompt
(answered better)

"prompt_tokens": 3669,
"completion_tokens": 583,
"openai_process_time": 7.52354,

those were resulted with GPT-3.5.
but also GPT-4 didn't reduce prompt tokens.

TechQA.

cost OpenAI usage in RAG retrievel augmented generation Pipeline (LangChain, FAISS, OpenAI)

Which returns something like:

There are 1 answers

Case1

Case2

Related Questions in TOKEN

Related Questions in OPENAI-API

Related Questions in LANGCHAIN

Related Questions in FAISS

Related Questions in RETRIEVAL-AUGMENTED-GENERATION

Popular Questions

Popular Tags

Trending Questions