Suggested methods of timing individual prompt / completion response from LLM? any options other than 'time'?

121 views Asked by texasdave At 20 October 2023 at 15:54

I'm prompting my local LLM in my ubuntu box via jupyter notebook, getting response, all that works fine. But now I'd like to time how long the cycle takes from submitting the prompt and getting the final response. Not sure how this is measured, inferences per second?

I've used linux 'time' command and it's output is cool but are there other methods?

Here is an example snippet of what I use currently:

%%time
result = qa_chain("my question to the LLM?")

output:

LLM response...

CPU times: user 1.85 s, sys: 84 ms, total: 1.93 s
Wall time: 1.93 s

I'd love some sort of simple, LLM specific output like:

inference time
GPU time

Anyone have something or seen something useful like this?

Original Q&A

TechQA.

Suggested methods of timing individual prompt / completion response from LLM? any options other than 'time'?

There are 0 answers

Related Questions in ARTIFICIAL-INTELLIGENCE

Related Questions in INFERENCE

Related Questions in LARGE-LANGUAGE-MODEL

Related Questions in GENERATIVE

Popular Questions

Popular Tags

Trending Questions