I want to use langchain to give my own context to an openai gpt llm model and query my data using the llm model. Firstly, I'm using langchainjs to load the documents based on the file path provided and split them into chunks. Then that splitted documents is fed into the pinecone database and get an store using the pinecone library. That store is used to create a llm QA chain and use that to query about my data.
This is my current implementation:
main.js
import { Document } from "langchain/document";
import { TextLoader } from "langchain/document_loaders/fs/text";
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { CharacterTextSplitter } from "langchain/text_splitter";
import { PineconeClient } from "@pinecone-database/pinecone";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { PineconeStore } from "langchain/vectorstores/pinecone";
import { OpenAI } from "langchain/llms/openai";
import { VectorDBQAChain } from "langchain/chains";
const openAIApiKey = process.env.OPEN_AI_API_KEY;
async function main(filePath) {
// create document array
const docs = [
new Document({
metadata: { name: `Filepath: ${filePath}` },
}),
];
// initialize loader
const Loader = path.extname(file) === `.pdf` ? PDFLoader : TextLoader;
const loader = new Loader(file);
// load and split the docs
const loadedAndSplitted = await loader.loadAndSplit();
// push the splitted docs to the array
docs.push(...loadedAndSplitted);
// create splitter
const textSplitter = new CharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 0,
});
// use the splitter to split the docs to different chunks
const splittedDocs = await textSplitter.splitDocuments(docs);
// create pinecone index
const client = new PineconeClient();
await client.init({
apiKey: process.env.PINECONE_API_KEY,
environment: process.env.PINECONE_ENVIRONMENT,
});
const pineconeIndex = client.Index(process.env.PINECONE_INDEX);
// create openai embedding
const embeddings = new OpenAIEmbeddings({ openAIApiKey });
// create a pinecone store using the splitted docs and the pinecone index
const pineconeStore = await PineconeStore.fromDocuments(
splittedDocs,
embeddings,
{
pineconeIndex,
namespace: "my-pinecode-index",
}
);
// initialize openai model
const model = new OpenAI({
openAIApiKey,
modelName: "gpt-3.5-turbo",
});
// create a vector chain using the llm model and the pinecone store
const chain = VectorDBQAChain.fromLLM(model, pineconeStore, {
k: 1,
returnSourceDocuments: true,
});
// use the chain to query my data
const response = await chain.call({
query: "Explain about the contents of the pdf file I provided.", // question is based on the file i provided
});
console.log(`\nResponse: ${response.text}`);
}
Note: My pinecone index has dimension of 1536 because I got error saying Vector dimension 1536 does not match the dimension of the index 1000
whenever I used a different dimension size.
The responses I get are totally unexpected. Sometimes it answers me if asked a normal and non-trivial question but often times, its like the model didn't get the context about my data at all. It just denies about knowing even the simplest of things. I got the basic idea of implementation from the langchainjs documantation.
I tried changing the gpt model with text davinci models and changing chunk size and recreate the pinecone store. But that also doesn't do anything.
Can anyone please help me what I'm doing wrong here? Or suggest me what should I be doing.
Any help is appreciated. Thank you.