I know there is a profiling tool called flame graph that is able to quickly identify the performance bottleneck of a binary.
Actually I don’t need to know the performance stats. Instead, I am only interested in the stack trace. The flame graph is able to visualize all the history stack trace. Flamegraph can contain much more info than a traditional GDB crash stacktrace because it remembers all history function calls.
So my question is : can flamegraph satisfy what I am looking for? I heard that flamegraph do samplings, so I am afraid it will lose function calls.
I found many flamegraph examples from this webpage: https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html#C++. The flamegraph is exactly what I am looking for, because I can zoom in and out to check the function calls. The only concern here is: I heard that prof tool such as perf only do sampling to collect stacktraces by a sampling ratio, so the flamegraph might not faithfully demonstrate all function calls.
In short, what I am looking for is: flamegraph minus the performance stats but plus the 100% function call histories.
I just figured out a way to achieve my goal. It's a hack by leveraging existing flamegraph function.
Step 1: instrument the code base by a patch similar to following:
declare the thread local stack somewhere:
add
LOG_CALL
macro at the beginning of the functions you want to trace. There might be hundreds of them, I am not aware of any automatic tool, so I added it manually.The above code works in clang compiler.
step 2: compile the code, and run a test case that you are interested in. We will get the traces. The content should look like this:
The above format is recognizable by flamegraph tool from https://github.com/brendangregg/FlameGraph/blob/master/flamegraph.pl. I learned this format by trying this example: https://github.com/brendangregg/FlameGraph/blob/master/files.pl
step 3: run the flamegraph with "--flamechart" option because we want the X axis sorted by time and disable the "auto-merging" in flamegraph.
The difference between "flamechart" and "flamegraph" is: flamegraph is designed to study perf bottleneck, so many samples are merged (sum). For this post's purpose, we need to sort the X-axis by time and never merge.
Use the following cmd:
Then I got the flamechart that I want in out.svg.