I am not sure how to estimate the complete cost of OpenAI usage in the RAG pipeline I am building. I want to track beforehand the token usage and the associated cost. Here are some code snippets:
model_name = 'text-embedding-ada-002'
embeddings = OpenAIEmbeddings(
model=model_name,
openai_api_key=openai_api_key
)
def create_and_load_faiss_index(chunks, embeddings, index_path):
try:
# Create a FAISS index from documents
db = FAISS.from_documents(chunks, embeddings)
# Save the FAISS index locally
db.save_local(index_path)
# Load the FAISS index from the saved location
db = FAISS.load_local(index_path, embeddings)
return db
except Exception as e:
print(f"An error occurred: {str(e)}")
return None
db = create_and_load_faiss_index(chunks, embeddings, index_path)
retriever = db.as_retriever()
template = """…"""
prompt_template = ChatPromptTemplate.from_template(template=template)
print(prompt_template)
llm = ChatOpenAI(model_name="gpt-4", temperature=0)
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt_template
| llm
| StrOutputParser()
)
query = f"…"
openai_output = rag_chain.invoke(query)
Track OpenAI usage (embeddings AND RAG): To track the usage beforehand, I need to calculate the cost of calling "text-embedding-ada-002" in the embeddings phase and "gpt-4" in the rag phase. For calling gpt-4 in RAG, I came across "get_openai_callback()". What about the embedding phase? Thinking about that gpt-4 will retrieve its information from the faiss index, this does not include any tokens in that sense, correct?
with get_openai_callback() as cb:
openai_output = rag_chain.invoke(query)
print(cb)
Which returns something like:
Tokens Used: 37
Prompt Tokens: 4
Completion Tokens: 33
Successful Requests: 1
Total Cost (USD): $7.2e-05
in my case, the result was not same.
I couldn't reduce Prompt Tokens(On the contrary, little bit increased prompt tokens and responsetime).
but embedded prompt returned better answer.
Case1
Non embedded prompt
Embedded prompt
(answered better)
Case2
Non embedded prompt
Embedded prompt
(answered better)
those were resulted with GPT-3.5.
but also GPT-4 didn't reduce prompt tokens.