My question is very simple.
Why Interlocked-Increment in multi-thread is slower than 1-thread?
Is it just because of Cache line bouncing(Cache line contention)?
Or is there another reason?
I'm using Intel i7, visual studio 2012. I tested counting number using Interlocked-Increment function. and test result was that.
1 Thread - 610385971
2 Thread - 497804468
3 Thread - 351516659
4 Thread - 333275249
If I understand the code from your verbal description correctly, then yes, main reason for performance degradation is competition for same cache line between different cores. I.e., for successful execution of interlocked increment a core must get cache line to E using something like https://en.wikipedia.org/wiki/MESIF_protocol that is require inter-core coordination and so visible slower in comparison to execution of interlocked increment on single core.