How to re-use embedded documents for Few-Shot LLM queries in Langchain4j?

Question

How to re-use embedded documents for Few-Shot LLM queries in Langchain4j?

208 views Asked by Sachu At 14 December 2023 at 14:32

I have an LLM Chat model with token limitation. I am trying to pass Sample User Messages and Expected AI Message Responses to the LLM to train it how to provide a response based on text extracted from a document. I am loading the document with System Loader

 Document document = loadDocument(toPath("file:///filepath\\filename.pdf"));

I am using regex splitter to help the LLM understand a pattern

   DocumentByRegexSplitter splitter=new DocumentByRegexSplitter(regex,joiner,maxCharLimit,maxOverlap,subSplitter);

After embedding the document (In-Memory embedding store and getting the relevant vectors), I join it into an information string which I can feed into a prompt template to generate a User Message

PromptTemplate promptTemplate = PromptTemplate.from(
            "Answer the following question to the best of your ability"
                    + "Question:\n"
                    + "{{question}}\n"
                    + "\n"
                    + "Base your answer on the following information:\n"
                    + "{{information}}");

String information = relevantEmbeddings.stream()
        .map(match -> match.embedded().text())
        .collect(joining("\n\n"));

Map<String, Object> variables = new HashMap<>();
variables.put("question", trainingQuestion);
variables.put("information", information);
Prompt prompt = promptTemplate.apply(variables);


List<ChatMessage> chatMessages=new ArrayList<>();
chatMessages.add(prompt .toUserMessage());
chatMessages.add(new AiMessage("Expected Response"));

    variables.put("question", actualQuestion);
    variables.put("information", information);
    prompt = promptTemplate.apply(variables);
chatMessages.add(prompt .toUserMessage());

I will add the traning messages to a List as required by the Java Langchain framework

AiMessage response=chatModel.generate(chatMessages);

To make a long story short, I am facing the token constraint because of embedding the same document information for all the Few Shot messages. Is there a way to make the LLM use the same document as a reference for the Few-Shot training and the actual query so I can avoid consuming tokens for the document multiple times?

Original Q&A

There are 1 answers

**Sachu** · Accepted Answer · 2023-12-15T06:40:22+00:00

Sachu On 15 December 2023 at 06:40 BEST ANSWER

I got a suggestion from a colleague to ad the document to SystemMessage so it won't have be passed multiple times for the training and actual User Messages. Will try this and update

TechQA.

How to re-use embedded documents for Few-Shot LLM queries in Langchain4j?

There are 1 answers

Related Questions in JAVA

Related Questions in DOCUMENT

Related Questions in LARGE-LANGUAGE-MODEL

Related Questions in FEW-SHOT-LEARNING

Related Questions in LANGCHAIN4J

Popular Questions

Popular Tags

Trending Questions