Is it possible to measure the number of successful store-forwarding operations using the performance counters on recent Intel x86 chips?
I see events for ld_blocks.store_forward
which measure failed store-forwarding, but it's clear to me if the successful case can be measured.
There is no documented event to count the number of successful store forwarding operations. However, I have experimentally determined a set of undocumented events for that purpose on Haswell and Broadwell. In particular, any event with event code 0x2 and an odd value for umask (any odd number such as 1) seems to be representing the event of successful store forwarding very accurately, i.e., the counts are as expected and the standard deviation is practically zero. I think you can use the same events on later (and even earlier) microarchitectures. Again, none of these events are documented.