How to get a function-duration breakdown in go (profiling)

6.3k views Asked by At

Update (Jan 24, 2019):

This question was asked 4 years ago about Go 1.4 (and is still getting views). Profiling with pprof has changed dramatically since then.

Original Question:

I'm trying to profile a go martini based server I wrote, I want to profile a single request, and get the complete breakdown of the function with their runtime duration. I tried playing around with both runtime/pprof and net/http/pprof but the output looks like this:

Total: 3 samples
       1  33.3%  33.3%        1  33.3% ExternalCode
       1  33.3%  66.7%        1  33.3% runtime.futex
       1  33.3% 100.0%        2  66.7% syscall.Syscall

The web view is not very helpful either.

We regularly profile another program, and the output seems to be what I need:

20ms of 20ms total (  100%)
      flat  flat%  sum%        cum  cum%
      10ms 50.00% 50.00%     10ms 50.00%  runtime.duffcopy
      10ms 50.00%   100%     10ms 50.00%  runtime.fastrand1
         0     0%   100%     20ms   100%  main.funcĀ·004
         0     0%   100%     20ms   100%  main.pruneAlerts
         0     0%   100%     20ms   100%  runtime.memclr

I can't tell where the difference is coming from.

1

There are 1 answers

3
JimB On BEST ANSWER

pprof is a timer based sampling profiler, originally from the gperftools suite. Rus Cox later ported the pprof tools to Go: http://research.swtch.com/pprof.

This timer based profiler works by using the system profiling timer, and recording statistics whenever it receives SIGPROF. In go, this is currently set to a constant 100Hz. From pprof.go:

// The runtime routines allow a variable profiling rate,
// but in practice operating systems cannot trigger signals
// at more than about 500 Hz, and our processing of the
// signal is not cheap (mostly getting the stack trace).
// 100 Hz is a reasonable choice: it is frequent enough to
// produce useful data, rare enough not to bog down the
// system, and a nice round number to make it easy to
// convert sample counts to seconds.  Instead of requiring
// each client to specify the frequency, we hard code it.
const hz = 100

You can set this frequency by calling runtime.SetCPUProfileRate and writing the profile output yourself, and Gperftools allows you to set this frequency with CPUPROFILE_FREQUENCY, but in practice it's not that useful.

In order to sample a program, it needs to be doing what you're trying to measure at all times. Sampling the idle runtime isn't showing anything useful. What you usually do is run the code you want in a benchmark, or in a hot loop, using as much CPU time as possible. After accumulating enough samples, there should be a sufficient number across all functions to show you proportionally how much time is spent in each function.

See also: