Averaging runtimes for performance analysis

160 views Asked by At

So modern computers and OSes are complicated, and there is a lot of stuff going on that makes accurate predictions and repeatability of runtimes difficult, like schedulers, branch predictors, caches, prefetchers, etc. I don't understand these things, but I thought I understood the implication: running it once isn't enough.

Luckily, perf stat provides a --repeat command, and even gives you rudimentary statistics. So to test this, I ran

#include <stdio.h>

int main(int argc, char *argv[])
{
    puts("Hello, World!");
    return 0;
}

compiled with gcc -O2 hello.c -o hello with the command perf stat -r 100 ./hello. This gives me nice output like this

    0,00043149 +- 0,00000688 seconds time elapsed  ( +-  1,59% )

However, if I now run this whole thing again a couple of times, the average runtime can be far away from the previous run:

    0,00043149 +- 0,00000688 seconds time elapsed  ( +-  1,59% )
    0,00043222 +- 0,00000657 seconds time elapsed  ( +-  1,52% )
    0,00041690 +- 0,00000612 seconds time elapsed  ( +-  1,47% )
    0,00045048 +- 0,00000832 seconds time elapsed  ( +-  1,85% )
    0,0005051  +- 0,0000232 seconds time elapsed  ( +-  4,60% )
    0,00043595 +- 0,00000676 seconds time elapsed  ( +-  1,55% )
    0,0004271  +- 0,0000168 seconds time elapsed  ( +-  3,94% )
    0,00043166 +- 0,00000604 seconds time elapsed  ( +-  1,40% )
    0,0010521 +- 0,0000548 seconds time elapsed  ( +-  5,21% )
    0,00042799 +- 0,00000714 seconds time elapsed  ( +-  1,67% )

Here the relative deviation of the averages is 37%, largely caused by the 2nd to last outlier. But even if I discount that run, its still 5.5%, much larger than the deviations from a "single" run.

So what is happening here? Why doesn't averaging work (in this case)? What should I be doing?

Edit: This also happens when the frequency scaling is disabled (sudo cpupower frequency-set --governor performance), but the outliers seem less frequent.

0

There are 0 answers