The failure of Dekker-style synchronization is typically explained with reordering of instructions. I.e., if we write
atomic_int X;
atomic_int Y;
int r1, r2;
static void t1() {
X.store(1, std::memory_order_relaxed)
r1 = Y.load(std::memory_order_relaxed);
}
static void t2() {
Y.store(1, std::memory_order_relaxed)
r2 = X.load(std::memory_order_relaxed);
}
Then the loads can be reordered with the stores, leading to r1==r2==0.
I was expecting an acquire_release fence to prevent this kind of reordering:
static void t1() {
X.store(1, std::memory_order_relaxed);
atomic_thread_fence(std::memory_order_acq_rel);
r1 = Y.load(std::memory_order_relaxed);
}
static void t2() {
Y.store(1, std::memory_order_relaxed);
atomic_thread_fence(std::memory_order_acq_rel);
r2 = X.load(std::memory_order_relaxed);
}
The load cannot be moved above the fence and the store cannot be moved below the fence, and so the bad result should be prevented.
However, experiments show r1==r2==0 can still occur. Is there a reordering-based explanation for this? Where's the flaw in my reasoning?
As I understand it (mainly from reading Jeff Preshings blog), an
atomic_thread_fence(std::memory_order_acq_rel)prevents any reorderings except forStoreLoad, i.e., it still allows to reorder aStorewith a subsequentLoad. However, this is exactly the reordering that has to be prevented in your example.More precisely, an
atomic_thread_fence(std::memory_order_acquire)prevents the reordering of any previousLoadwith any subsequentStoreand any subsequentLoad, i.e., it preventsLoadLoadandLoadStorereorderings across the fence.An
atomic_thread_fence(std::memory_order_release)prevents the reordering of any subsequentStorewith any precedingStoreand any precedingLoad, i.e., it preventsLoadStoreandStoreStorereorderings across the fence.An
atomic_thread_fence(std::memory_order_acq_rel)then prevents the union, i.e., it preventsLoadLoad,LoadStore, andStoreStore, which means that onlyStoreLoadmay still happen.