x86/Linux multithreading: perf report children percentage sum does not match the parent percentage

61 views Asked by At

Consider the following simple example:

#define _GNU_SOURCE

#include <stdatomic.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sched.h>
#include <syscall.h>
#include <linux/futex.h>

volatile atomic_int variable;
int futex_word;

int foo(void *v)
{
    while (1)
    {
        int expected = atomic_load(&variable);
        atomic_compare_exchange_strong(&variable, &expected, expected + 1);
        syscall(SYS_futex, &futex_word, FUTEX_WAKE, 1);
    }
}

int main(void)
{
    void *stack = (char *)mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0) + 4096;
    clone(foo, stack, CLONE_CHILD_SETTID, NULL);
    sleep(1000);
}

Running this example with

sudo perf record --call-graph dwarf,16384 -F 9123 ./main

and then

sudo perf report

I got the following strange result I don't know who to interpret:

enter image description here

In the image the very first expanded symbol foo consists of 2 entries:

  • 95.53% foo
  • 0.93% __GI___clone

The problem I see here is 0.93% + 95.53% = 96.46% which mismatches 96.52% that is shown for the symbol foo.

Why there's such a mismatch? I see this pretty often when profiling different binaries, but not sure if it's possible to interpret it as a measurement error.

0

There are 0 answers