To be able to profile application runtimes whose binaries will actually be run under a simulator (NS-3/DCE). I wanted to use the linux performance counters, I expected the instruction count for an application which has no source of non-determinism to be deterministic. I couldn't be more wrong according to the linux performance counters, let's take a simple example:
$ (perf stat -c -- sleep 1 2>&1 && perf stat -c -- sleep 1 2>&1) |grep instructions
669218 instructions # 0,61 insns per cycle
682286 instructions # 0,58 insns per cycle
1) What is the source of this non-determinism? Does this stem from the low-level branch-prediction and other engines in the CPU.
2) Other question, is there a way to know the amount of instructions fed to the CPU (in contrast to the amount of instructions in the example output), in order to do get the amount of executed code in a deterministic way?
Summary:
1) The non-determinism is caused by variation in the
sleep 1
command not from branch-prediction or other microarchitectural features.2) You can find the number of instruction fetched by using a hardware even counter if your CPU supports it. However, this will vary more than the number of instructions retired (which is what perf typically reports for instructions).
Details:
The
sleep
command is not a good test case if you want a deterministic number of instructions to execute. It will execute a non-deterministic number of instructions because there will be some slight variation in what the kernel is doing.You can specify whether to collect user-mode or kernel-mode instruction counts with the
instructions:u
for user-mode orinstructions:k
for kernel mode. For two runs of:I get the following results:
and
As you can see the actual elapsed time of
sleep 1
varies slightly. Which is the source of the non-determinism. However, the number of user-mode instructions has less variation than kernel-mode instructions.