I am trying to measure cpu time of a small function that usually takes about 1500 microseconds. I sometimesget inaccurate or wide confidence interval. I would like to find best accurate way to benchmark the function in python 3.7. I would like to measure the function time + get the return values for other calculations.

I tried defualt_timer from timeit, see my code below.

from timeit import default_timer as timer
times=[] # store the times for 100 runs,then get min,max,averages etc

for i in range (100):
    t1= timer()
    x,y,z = apply_message(s,text)
    t2= timer()
    execution_time= (t2-t1) * 1000000 # convert it to microsecond

I found sometimes the minimum 1300 and max 75000, big difference. In other cases they are close. What i want to see at least results that gives me 95 confidence interval less than 10% from average.

1 Answers

Peter Cordes On

It's normal to get outliers if (for example) your process gets migrated to another CPU during execution. (So all memory accesses miss in cache for a while, because things were hot in L1d and L2 on the previous core). This can happen in real life, too.

So you have to decide what you want to measure: the "normal" nothing-weird-happened case, or the full distribution including the worst possible case.

It's definitely not a Gaussian normal distribution if you keep the outliers, so take any statistics with a grain of salt if they're based on that assumption!

If you want to exclude outliers, pin the CPU frequency, and pin your process to a single core. But you can still get outliers when an interrupt handler or other kernel task does any significant amount of work on that core, or a page fault or whatever.

Or on a CPU with Hyperthreading, if another thread runs on the sibling logical core that shares the same physical core. Or another task on the same machine competes for shared resources like memory bandwidth or cache footprint, or disk I/O.

If you know enough about what your function does and how Python runs it that you can be reasonably certain that the outliers aren't "real", i.e. that your process didn't have the CPU for most of that wall-clock time, or a CPU migration happened, then you can just discard outliers above some threshold.

Or look at the median time instead of mean. Median is not sensitive to huge outliers, but will still respond to variation.

If you're timing with the same input repeatedly so you expect the function to take the same time, you can take the minimum. (Usually essentially equal to the median.)