I am trying to profile multi-threaded workloads with gperftools but am having difficulty interpreting the output produced. I have written a simple program that launches two threads with identical workloads and profiled with gperftools cpu profiler. In the output I can see the two functions for each thread but the overhead of each one varies hugely from one run to the next. I would expect both functions to show identical results, since they are identical workloads, but in fact, one might be 90% and the other 10%, or sometimes 80%/20%, 95%/5% etc. I don't understand why the functions are showing different overhead, or why the result changes so much from one run to the next. The benchmark runs for about 5 seconds and has 1600 samples so should be stable.
Is there any documentation that explains how profiling works for multithreaded workloads, and how to interpret the output? For example, does the profiler do a backtrace for every thread on every sample and if not, what is it doing?
#include <vector>
#include <cstdlib>
#include <thread>
using namespace std;
void thread_func() {
int size = 500000;
vector<int> V(size);
for(int i = 0; i < 100000; i++) {
V.erase(V.begin() + (rand() % size));
V.insert(V.begin() + (rand() % size), rand() % 10);
}
}
void thread_func2() {
int size = 500000;
vector<int> V(size);
for(int i = 0; i < 100000; i++) {
V.erase(V.begin() + (rand() % size));
V.insert(V.begin() + (rand() % size), rand() % 10);
}
}
int main() {
srand(1234);
thread t1(thread_func);
thread t2(thread_func2);
t1.join();
t2.join();
return 0;
}
Sample output:
0 0.0% 100.0% 1429 89.3% thread_func
0 0.0% 100.0% 172 10.7% thread_func2
Where is the 89.3% and 10.7% coming from? (These are the percentage of total samples in the function and its callees)
This is known issue with SIGPROF signal delivery. See https://github.com/golang/go/issues/14434 for some details.
Gperftools actually has "fix" for the skew (as mentioned on that issue). You just need to set CPUPROFILE_PER_THREAD_TIMERS=t and ensure that librt and libpthread is linked. And you'll also need to "register" your threads or LD_PRELOAD https://github.com/alk/gperf-all-threads