RDTSC slow in Ubuntu

822 views Asked by At

I have a piece of inlined assembly that I compile with clang++:

    asm volatile ("LFENCE\n\t"
              "RDTSC\n\t"
              "shl $32, %%rdx\n\t"
              "or %%rdx, %%rax\n\t"
              : "=a" (retval)
              :: "%rax", "%rdx");

On OSX, the total cost of checking rdtsc as above is around 10-20 cycles. When I compile the same code on Linux (not a virtual machine), it takes around 2500 cycles. This leads me to suspect that Linux is doing something dumb like disabling RDTSC in user space. From these articles, it looks like this has at least been considered for Linux:

I am running Ubuntu 14.04

Questions:

  • Did the kernel mode only rdtsc actually make it into the Ubuntu kernel?
  • If it is there, how do I detect the current setting?
  • And how do I get user mode rdtsc working again?

PS: I am fully aware of the issue with rdtsc of getting wrong measurements, pipeline flushing, etc. I can live with them and I am taking the precautions where needed. I just want rdtsc to be fast.

1

There are 1 answers

4
Brendan On

Intel have been saying "A secure operating system would set the TSD flag during system initialisation to disable user access to the time stamp counter" ever since it was introduced 20 years ago. Most OSs ignored Intel; and every 5 years or so some security researcher somewhere "discovers" a new way of using such precise timing to weaken passwords, encryption keys, etc. Examples: http://people.csail.mit.edu/tromer/papers/cache.pdf , http://www.daemonology.net/papers/htt.pdf

If you add to that the problems caused by people assuming it's ticking at a constant rate (it's not on old CPUs); then the problems caused by people assuming it has anything to do with performance (it doesn't on newer CPUs); then the people who just plain use it wrong (e.g. timing a single short sequence where the error is massive); it starts to seem like an even worse idea.

If you then add to that the problems with "out-of_sync TSC" on multi-CPU systems (especially NUMA systems); it gets much worse (especially for kernel trying to keep it "sort of synchronised maybe").

Finally, if you have a look at things like performance monitoring counters, profilers, etc; you realise that RDTSC is the wrong tool for that job. Then you look in the other direction at "time of day" and "elapsed time" functions, and realise there's decent/portable alternatives there too.

Note: I don't know if Ubuntu has disabled RDTSC in user-space on all systems, or just on some systems (e.g. systems where it's not constant rate and/or not synchronised between CPUs), or even if they haven't disabled it at all. All I know is that it should've been disabled 20 years ago.

EDIT: Above is the answer to the question asked. Below is the answer you need.

To use RDTSC properly; start by timing "nothing" in a loop, while discarding "higher than normal" results (caused by IRQs, task switches, etc). Use this to find an average for "nothing" (the average overhead of RDTSC alone).

Next, do exactly the same thing for the code you're testing (including discarding "higher than normal" results), to find the average overhead of "RDTSC + your code".

Finally; subtract the average overhead of RDTSC alone from the "RDTSC + your code" result to find how long your code would've taken on its own.