How would I go about monitoring a particular process's execution (namely, its branches, from the Branch Trace Store) using the Intel Performance Counter monitor, while filtering out other process's information?
Intel Performance Monitor -- any way to monitor per-process?
2.5k views Asked by user541686 At
3
There are 3 answers
4
On
We were forced to build our own instrumenting profiler that reads the MSRs directly to get this information. The Performance Counter Monitor's source code demonstrates how to build a kernel driver that reads them.
Previously we used VTune, but it crashes when run on our app. (When we tried OProfile on the Linux version, it actually crashed the entire kernel and forced us to power-cycle the machine, which was pretty funny.)
1
On
Check out https://github.com/andikleen/pmu-tools/blob/master/toplev.py
Examples: toplev.py -l2 program measure whole system in level 2 while program is running
You should know that BTS (Branch trace store) and Performance monitoring events/counters (inside CPU, its PMU block) are very different things.
The Branch Trace Store is function of CPU when it does record every taken branch (pairs of eip - first of branch instruction and second of branch target; there is also a word of flags added to each pair) in special area of memory. Result of it is very like to Single-stepping and recording order of executed code blocks (basic blocks). It is just like doing code coverage with assistance from compiler, when every branch is instrumented by compiler.
BTS is just a bit in the MSR_DEBUGCTLA MSR (it is intel x86 register); I'm almost sure that this register is thread-specific (as it is in Linux), so you need no to hook scheduler. There is some examples of working with this MSR in windows; but different bit is used. Also, don't forget to set DS_AREA correctly. So, if you really want BTS, take a copy of Intel Arch Manual (Volume 3b, Part "Debugging and Performance monitoring", section "19.7.8 Branch Trace Store (BTS)") and program BTS manually. Hardest part is to handle DS area overflow (you need custom interrupt handler).
If you want to know not a trace of executed code but statistics of you program (how much instructions executed; how well was branches predicted; how much indirect branches are here ...), you should use Performance monitoring events aka "Precise Event Based Sampling" (PEBS). Intel Vtune does this; there should be some other tools, even the Intel PBS your linked. The only problem (this is bit more difficult with free tools) is to find name of Events you want. Events based on instruction execution are always binded to some thread.
What does event-based sampling means: you can set some limit, e.g. 1000 for some event, eg. BR_INST_EXEC.COND ("number of conditional near branch instructions executed") or BR_INST_EXEC.DIRECT ("all unconditional near branch instructions excluding calls and indirect branches."), up to 2-4 events at once. Then CPU will count every situation which correspond to this event. When there will be 1000th situation, the Event (interrupt) will be generated for instrution EIP. With sampling it is easy to get detailed statistics of your code behaviour. If you will set limit to something very low and if you will not sum events for eip, you will get trace ;)
With PEBS you can know how bad is your code for the CPU, where mispredicted branches are located, which instructions wait data from cache, etc. There are 100s of events (appendix A of Volume 3b).
PS there is some code for BTS/win: http://blog.csdn.net/quincy_hu/article/details/4053163
PPS there is shorter overview of PMU programming, both PEBS and BTS. software.intel.com/file/30320 It is for Nehalem, but it can be actual even for Sandy.