Should load-acquire see store-release immediately?

678 views Asked by At

Suppose we have one simple variable(std::atomic<int> var) and 2 threads T1 and T2 and we have the following code for T1:

...
var.store(2, mem_order);
...

and for T2

...
var.load(mem_order)
...

Also let's assume that T2(load) executes 123ns later in time(later in the modification order in terms of the C++ standard) than T1(store). My understanding of this situation is as follows(for different memory orders):

  1. memory_order_seq_cst - T2 load is obliged to load 2. So effectively it has to load the latest value(just as it is the case with the RMW operations)
  2. memory_order_acquire/memory_order_release/memory_order_relaxed - T2 is not obliged to load 2 but can load any older value with the only restriction: that value should not be older than the latest loaded by that thread. So, for example var.load returns 0.

Am I right with my understanding?

UPDATE1:

If I'm wrong with the reasoning, please provide the text from the C++ standard which proofs it. Not just theoretical reasoning of how some architecture might work.

3

There are 3 answers

3
ixSci On BEST ANSWER

Having found no arguments to prove my understanding wrong I deem it correct and my proof is as follows:

memory_order_seq_cst - T2 load is obliged to load 2.

That's correct because all operations using memory_order_seq_cst should form the single total order on the atomic variable of all the memory operations. Excerpt from the standard:

[29.9/3] There shall be a single total order S on all memory_order_seq_cst operations, consistent with the “happens before” order and modification orders for all affected locations, such that each memory_order_seq_cst operation B that loads a value from an atomic object M observes one of the following values <...>

The next point of my question:

memory_order_acquire/memory_order_release/memory_order_relaxed - T2 is not obliged to load 2 but can load any older value <...>

I didn't find any evidences which might indicate that the load executed later in the modification order should see the latest value. The only points I found for the store/load operations with any memory order different from the memory_order_seq_cst are these:

[29.3/12] Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.

and

[1.10/28] An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.

So the only guarantee we have is that the variable written will be visible within some time - that's pretty reasonable guarantee but it doesn't imply immediate visibility of the previous store. And it proofs my second point.

Given all that my initial understanding was correct.

8
Tsyvarev On

Am I right with my understanding?

No. You misunderstand memory orders.

let's assume that T2(load) executes 123ns later than T1(store)...

In that case, T2 will see what T1 does with any type of memory orders(moreover, this property is applied to read/write of any memory region, see e.g. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4431.pdf, 1.10, p.15). The key word in your phrase is later: it means that someone else forces ordering of these operations.

Memory orders are used for other scenario:

Lets some operation OP1 comes in thread T1 before store operation, OP2comes after it, OP3 comes in thread T2 before load operation, OP4 comes after it.

//T1:                         //T2:
OP1                           OP3
var.store(2, mem_order)       var.load(mem_order)
OP2                           OP4

Assume, that some order between var.store() and var.load() can be observed by the threads. What one can garantee about cross threads order of other operations?

  1. If var.store uses memory_order_release, var.load uses memory_order_acquire and var.store is ordered before var.load (that is, load returns 2), then effect of OP1 is ordered before OP4.

E.g., if OP1 writes some variable var1, OP4 reads that variable, then one can be assured that OP4 will read what OP1 write before. This is the most utilized case.

  1. If both var.store and var.load uses memory_order_seq_cst and var.store is ordered after var.load (that is, load returns 0, which was value of variable before store), then effect of OP2 is ordered after OP3.

This memory order is required by some tricky syncronization schemes.

  1. If either var.store or var.load uses memory_order_relaxed, then with any order of var.store and var.load one can garantee no order of cross threads operations.

This memory order is used in case, when someone else ensure order of operations. E.g., if thread T2 creation comes after var.store in T1, then OP3 and OP4 are ordered after OP1.

UPDATE: 123 ns later implies *someone else* force ordering because computer's processor has no notion about universal time, and no operation has precise moment when it is executed. For measure time between two operations you should:

  1. Observe ordering between finishing the first operation and beginning time counting operation on some cpu.
  2. Observe ordering between beginning and finishing time counting operations.
  3. Observe ordering between finishing time counting operation and start of the second operation.

Transitively, these steps make ordering between the first operation and the second one.

0
hotpaw2 On

123 nS later doesn't enforce of ordering T2 seeing the results of T1. That's because if the physical program counter (transistors, etc.) running T2 is more than 40 Meters away from the physical program counter running T1 (large multi-core supercomputer, etc.), then the speed of light will not allow the T1 written state information to propagate that far (yet). Similar effect if the physical memory used for the load/stores is remote by some distance to both thread processors.