Handling Null Embeddings and Missing Data in Pinecone for Startup Information Retrieval

97 views Asked by At

I'm working with structured data for startup companies, and I'm using Pinecone for vector embeddings of textual data in the "LongDescription" field. However, some entries in my dataset have null values in the "LongDescription" field, while still containing information in other columns such as "CompanyName," "FoundingDate," "Website," and "Funding."

When I run a query for a company with a null "LongDescription" (let's call it "Company A"), Pinecone returns "no info." I'd like to know how I can handle queries for such entries where the text field is null.

Here's a simplified version of my code:

from langchain.vectorstores import Pinecone

text_field = "LongDescription"

Initialize the vector store object

vectorstore = Pinecone(index, embeddings.embed_query, text_field)

def augment_prompt(query: str): # Get top 3 results from knowledge base results = vectorstore.similarity_search(query) # Get the text from the results source_knowledge = "\n".join([x.page_content for x in results]) # Feed into an augmented prompt augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""
return augmented_prompt

prompt = HumanMessage( content=augment_prompt( "How much funding is for company A?" ) )

res = chat(messages + [prompt]) print(res.content)

Answer :

I'm sorry, but I couldn't find any specific information about the funding for Company A in the given contexts. It's possible that the information related to that query is not included in the provided contexts.

My issue is that when I query for "Company A," which lacks a "LongDescription," I get a "no info" response. Is there a way to handle such queries more effectively, considering the presence of data in other columns? Any suggestions or guidance would be greatly appreciated.

0

There are 0 answers