Is flamegraph able to show all function calls?

312 views Asked by At

I know there is a profiling tool called flame graph that is able to quickly identify the performance bottleneck of a binary.

Actually I don’t need to know the performance stats. Instead, I am only interested in the stack trace. The flame graph is able to visualize all the history stack trace. Flamegraph can contain much more info than a traditional GDB crash stacktrace because it remembers all history function calls.

So my question is : can flamegraph satisfy what I am looking for? I heard that flamegraph do samplings, so I am afraid it will lose function calls.

I found many flamegraph examples from this webpage: https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html#C++. The flamegraph is exactly what I am looking for, because I can zoom in and out to check the function calls. The only concern here is: I heard that prof tool such as perf only do sampling to collect stacktraces by a sampling ratio, so the flamegraph might not faithfully demonstrate all function calls.

In short, what I am looking for is: flamegraph minus the performance stats but plus the 100% function call histories.

1

There are 1 answers

0
Dachuan Huang On

I just figured out a way to achieve my goal. It's a hack by leveraging existing flamegraph function.

Step 1: instrument the code base by a patch similar to following:

+#include <iostream>
+#include <vector>
+#include <sstream>
+
+extern thread_local std::vector<std::string> thread_local_stack;
+
+struct tracer_t {
+    tracer_t(std::string method) {
+        thread_local_stack.emplace_back(std::move(method));
+        std::ostringstream oss;
+        for (int i = 0; i < thread_local_stack.size(); ++i) {
+            if (i) {
+                oss << ";";
+            }
+            oss << thread_local_stack[i];
+        }
+        printf("%s 1", oss.str().c_str());
+    }
+
+    ~tracer_t() {
+        thread_local_stack.pop_back();
+    }
+};
+
+inline std::string methodName(const std::string& prettyFunction)
+{
+    size_t colons = prettyFunction.find("::");
+    size_t begin = prettyFunction.substr(0, colons).rfind(" ") + 1;
+    size_t end = prettyFunction.rfind("(") - begin;
+
+    return prettyFunction.substr(begin,end) + "()";
+}
+
+#define __METHOD_NAME__ methodName(__PRETTY_FUNCTION__)
+
+#define LOG_CALL tracer_t _token(__METHOD_NAME__)

declare the thread local stack somewhere:

+thread_local std::vector<std::string> thread_local_stack;

add LOG_CALL macro at the beginning of the functions you want to trace. There might be hundreds of them, I am not aware of any automatic tool, so I added it manually.

The above code works in clang compiler.

step 2: compile the code, and run a test case that you are interested in. We will get the traces. The content should look like this:

func1 1
func1;func2 1
func1;func2;func3 1
func1;func2;func4 1
func1;func2;func3 1
func2 1
...

The above format is recognizable by flamegraph tool from https://github.com/brendangregg/FlameGraph/blob/master/flamegraph.pl. I learned this format by trying this example: https://github.com/brendangregg/FlameGraph/blob/master/files.pl

step 3: run the flamegraph with "--flamechart" option because we want the X axis sorted by time and disable the "auto-merging" in flamegraph.

The difference between "flamechart" and "flamegraph" is: flamegraph is designed to study perf bottleneck, so many samples are merged (sum). For this post's purpose, we need to sort the X-axis by time and never merge.

Use the following cmd:

cat ~/d/output.txt |  ./flamegraph.pl --hash --countname=bytes --flamechart  > /tmp/out.svg

Then I got the flamechart that I want in out.svg.