codellama generates newline character repeatedly

472 views Asked by At

I am using Langchain with codellama using Llama.cpp. (huggingface - TheBloke/CodeLlama-34B-Instruct-GPTQ) I have 4 Testla T4 in my device. I have installed the Llama.cpp with OpenBLAS. When I load the model with hgguf file, I could see the parameter BLAS=1 and I could see the gpu memory utilization with nvdia-smi, it's increasing while I was loading the model. When I try to generate with codellama using Llama(), It generated well.

But I try to use PromptTemplate and LLMChain, It fails, the model is not generating meaningful results, It just generates many \n characters as output. I don't understand why. While It is working, I could the gpu utilization, so it uses my tesla GPUs.

I am using <> token to give some additional information to the LLM.

My code is like below:

%set_env TEMPERATURE=0.5 
%set_env GPU_LAYERS=100
%set_env MODEL_PATH=../../llm-models/codellama-34b-instruct.Q4_K_M.gguf
%set_env MODEL_N_CTX=4096
%set_env TOP_P=0.95
%set_env TOP_K=40
%set_env THREADS=8
%set_env EMBEDDINGS_MODEL_NAME=all-mpnet-base-v2

from langchain.llms import LlamaCpp

stop = ['Human:', 'Assistant:', 'User:']

llm = LlamaCpp(model_path=model_path,
                n_ctx=model_n_ctx, 
                verbose=True, 
                n_threads=threads,
                n_gpu_layers=gpu_layers, 
                n_batch=int(model_n_ctx)/8,
                stop=stop, 
                temperature=temperature,
                top_p=top_p,
                top_k=top_k,
                use_mlock=False,
                max_tokens=2000,
                )

template = f"""<s>[INST] <<SYS>>
{custom_initial_prompt}""" + """
<</SYS>>
problem description:
{text}
code:
{code}[/INST]"""
prompt_template = PromptTemplate(template=template, input_variables=["text", "code"])
chain = LLMChain(llm=llm, prompt=prompt_template, verbose=True)

%%time
result = chain.run(code = script[0].page_content, text = pages[0].page_content)

The result of this prompt is like below.

enter image description here

Sometimes, it generates a lot of newline characters. How can I solve this problem ?

I increase the number of GPUs. I tried the small version of the codellama model (7B Model). I tried different cuda version. I tried to load the small model into my local computer, it works well.

0

There are 0 answers