I'm prompting my local LLM in my ubuntu box via jupyter notebook, getting response, all that works fine. But now I'd like to time how long the cycle takes from submitting the prompt and getting the final response. Not sure how this is measured, inferences per second?
I've used linux 'time' command and it's output is cool but are there other methods?
Here is an example snippet of what I use currently:
%%time
result = qa_chain("my question to the LLM?")
output:
LLM response...
CPU times: user 1.85 s, sys: 84 ms, total: 1.93 s
Wall time: 1.93 s
I'd love some sort of simple, LLM specific output like:
- inference time
- GPU time
Anyone have something or seen something useful like this?