So modern computers and OSes are complicated, and there is a lot of stuff going on that makes accurate predictions and repeatability of runtimes difficult, like schedulers, branch predictors, caches, prefetchers, etc. I don't understand these things, but I thought I understood the implication: running it once isn't enough.
Luckily, perf stat
provides a --repeat
command, and even gives you rudimentary statistics. So to test this, I ran
#include <stdio.h>
int main(int argc, char *argv[])
{
puts("Hello, World!");
return 0;
}
compiled with gcc -O2 hello.c -o hello
with the command perf stat -r 100 ./hello
. This gives me nice output like this
0,00043149 +- 0,00000688 seconds time elapsed ( +- 1,59% )
However, if I now run this whole thing again a couple of times, the average runtime can be far away from the previous run:
0,00043149 +- 0,00000688 seconds time elapsed ( +- 1,59% )
0,00043222 +- 0,00000657 seconds time elapsed ( +- 1,52% )
0,00041690 +- 0,00000612 seconds time elapsed ( +- 1,47% )
0,00045048 +- 0,00000832 seconds time elapsed ( +- 1,85% )
0,0005051 +- 0,0000232 seconds time elapsed ( +- 4,60% )
0,00043595 +- 0,00000676 seconds time elapsed ( +- 1,55% )
0,0004271 +- 0,0000168 seconds time elapsed ( +- 3,94% )
0,00043166 +- 0,00000604 seconds time elapsed ( +- 1,40% )
0,0010521 +- 0,0000548 seconds time elapsed ( +- 5,21% )
0,00042799 +- 0,00000714 seconds time elapsed ( +- 1,67% )
Here the relative deviation of the averages is 37%, largely caused by the 2nd to last outlier. But even if I discount that run, its still 5.5%, much larger than the deviations from a "single" run.
So what is happening here? Why doesn't averaging work (in this case)? What should I be doing?
Edit: This also happens when the frequency scaling is disabled (sudo cpupower frequency-set --governor performance
), but the outliers seem less frequent.