I was reading this question about using a bool for thread control and got intrigued by this answer by @eran:
Using volatile is enough only on single cores, where all threads use the same cache. On multi-cores, if stop() is called on one core and run() is executing on another, it might take some time for the CPU caches to synchronize, which means two cores might see two different views of isRunning_.
If you use synchronization mechanisms, they will ensure all caches get the same values, in the price of stalling the program for a while. Whether performance or correctness is more important to you depends on your actual needs.
I have spent over an hour searching for some statement that says synchronization primitives force cache coherency but have failed. The closest I have come is Wikipedia:
The keyword volatile does not guarantee a memory barrier to enforce cache-consistency.
Which suggests that memory barriers do force cache consistency, and since some synchronization primitives are implemented using memory barriers (again from Wikipedia) this is some "evidence".
But I don't know enough to be certain whether to believe this or not, and be sure that I'm not misinterpreting it.
Can someone please clarify this?
As I understand, synchronization primitives won't affect cache coherency at all. Cache is French for hidden, it's not supposed to be visible to the user. A cache coherency protocol should work without the programmer's involvement.
Synchronization primitives will affect the memory ordering, which is well defined and visible to the user through the processor's ISA.
A good source with detailed information is A Primer on Memory Consistency and Cache Coherence from the Synthesis Lectures on Computer Architecture collection.
EDIT: To clarify your doubt
The Wikipedia statement is slightly wrong. I think the confusion might come from the terms memory consistency and cache coherency. They don't mean the same thing.
The
volatile
keyword in C means that the variable is always read from memory (as opposed to a register) and that the compiler won't reorder loads/stores around it. It doesn't mean the hardware won't reorder the loads/stores. This is a memory consistency problem. When using weaker consistency models the programmer is required to use synchronization primitives to enforce a specific ordering. This is not the same as cache coherency. For example, if thread 1 modifies location A, then after this event thread 2 loads location A, it will receive an updated (consistent) value. This should happen automatically if cache coherency is used. Memory ordering is a different problem. You can check out the famous paper Shared Memory Consistency Models: A Tutorial for more information. One of the better known examples is Dekker's Algorithm which requires sequential consistency or synchronization primitives.EDIT2: I would like to clarify one thing. While my cache coherency example is correct, there is a situation where memory consistency might seem to overlap with it. This when stores are executed in the processor but delayed going to the cache (they are in a store queue/buffer). Since the processor's cache hasn't received an updated value, the other caches won't either. This may seem like a cache coherency problem but in reality it is not and is actually part of the memory consistency model of the ISA. In this case synchronization primitives can be used to flush the store queue to the cache. With this in mind, the Wikipedia text that you highlighted in bold is correct but this other one is still slightly wrong: The keyword volatile does not guarantee a memory barrier to enforce cache-consistency. It should say: The keyword volatile does not guarantee a memory barrier to enforce memory consistency.