Suppress LLamaCpp stats output

541 views Asked by At

How can I suppress LLamaCpp stats output in Langchain ... equivalent code :

llm = LlamaCpp(model_path=...,  ....)
llm('who is Caesar')


> who is Caesar ?
 Julius Caesar was a Roman general and statesman who played a critical role in the events that led to the demise of the Roman Republic and the rise of the Roman Empire. He is widely considered one of Rome's greatest warlords and is often ranked alongside his adopted son, Octavian, as one of the two most important figures in ancient
llama_print_timings:        load time =   532.05 ms
llama_print_timings:      sample time =    32.74 ms /    71 runs   (    0.46 ms per token,  2168.40 tokens per second)
llama_print_timings: prompt eval time = 29011.08 ms /   432 tokens (   67.16 ms per token,    14.89 tokens per second)
llama_print_timings:        eval time = 10284.56 ms /    70 runs   (  146.92 ms per token,     6.81 tokens per second)
llama_print_timings:       total time = 39599.38 ms
 Rome.
2

There are 2 answers

0
sten On

the reason is that langchain doesnt support "verbose" parameter.. you can edit the init() method of its class and add it..

0
Malgo On

Below code worked fine for me for gguf models. Added the verbose parameter while loading the model.

from llama_cpp import Llama
llm = Llama(model_path="/path/to/model.gguf",
            verbose=False)

This stopped the stat outputs for model loading as well as inferencing.

Source: https://llama-cpp-python.readthedocs.io/en/latest/api-reference/