OS: Ubuntu 18.04 Question: How to profile a multi-process program?
I usually use GNU perf tool to profile a program as follows:
perf stat -d ./main [args]
, and this command will return a detailed performance counter as follows:
47,455.09 msec task-clock # 8.602 CPUs utilized
129,199 context-switches # 0.003 M/sec
92 cpu-migrations # 0.002 K/sec
16,228 page-faults # 0.342 K/sec
117,757,409,457 cycles # 2.481 GHz (49.84%)
236,496,093,412 instructions # 2.01 insn per cycle (62.31%)
1,454,901,353 branches # 30.658 M/sec (62.18%)
6,168,091 branch-misses # 0.42% of all branches (62.30%)
183,462,410,176 L1-dcache-loads # 3866.021 M/sec (62.55%)
189,736,991 L1-dcache-load-misses # 0.10% of all L1-dcache hits (62.75%)
8,330,520 LLC-loads # 0.176 M/sec (50.14%)
628,142 LLC-load-misses # 7.54% of all LL-cache hits (50.25%)
5.516529249 seconds time elapsed
46.947476000 seconds user
0.989185000 seconds sys
What I focus on is CPU Efficiency (Line 1), IPC (Line 6), L1, and LLC Bandwidth (Line 9 and 11).
But now, I need to profile every process of an MPI program, assume that we have 3 processes by executing mpiexec -np 3 ./main [args]
, how can I get the CPU Efficiency, IPC, L1, and LLC info of every process respectively? (By using perf stat -d, I only get overall information containing 3 processes, which is currently not enough for me)
The output I want is like this:
PID: 1
LLC Band.: xxx
PID: 2
LLC Band.: xxx
PID: 3
LLC Band.: xxx
How can I do this? (I wonder can GNU gperf
do this? Or is there some C++ way to do this?)
Basic profilers like gperf or gprof don't work well with MPI programs, but there are many profiling tools specifically designed to work with MPI that collect and report data for each MPI rank. Virtually all of them can collect hardware performance counters for cache misses. Here are a few options:
Decent HPC centers typically have one or more of them installed. Refer to the manuals to learn how to gather hardware counters.