I am trying to use perfsuite (which internally uses papi) to measure some performance counter around a function. This function spawns one thread per core. The problem is, if I start the counters before the function call and stop them after the call, I get incorrect values for those counters. But if the function doesn't create any threads, it gets the right values.
I know psrun can get counters for all cores for an executable. But I want the same feature for a function call, not an executable.
I am using perfsuite 1.1.1 with papi 4.4.0 from C on Debian.
PAPI counts are based on threads and not on cores. If you want the latter, you may want to consider using intel pcm - which is capable of providing per-core counts.
Does it answer your question?
tjr