I observe almost 50% overhead with some frontend bound microbenchmarks even though I don't instrument the code. There are no callbacks and the pintool just attaches to the microbenchmark using its PID. What is causing this overhead? Is there a way to overcome it? Thanks!
I tried removing all callbacks. Essentially the pintool attaches to the benchmark using its PID and doesn't instrument anything. Vtune says Pincrt is consuming a lot of time.
[EDIT] This is Vtune threading bottom-up snapshot. Here, I am attaching pintool to a game binary. Bottom-up Vtune Function that takes the most time.