Written in cppmem pseudo code:

int main()                                                                                                                                                                                   
  atomic_int n = -1;                                                                                                                                                                         
  atomic_int n2 = 0;                                                                                                                                                                         

        n2.store(1, mo_relaxed);                                                                                                                                                             
        if (n.load(mo_relaxed) != -1)                                                                                                                                                        
          n.store(1, mo_release);                                                                                                                                                            
        n.store(0, mo_release);                                                                                                                                                              
        int expected = 0;                                                                                                                                                                    
          desired = n2.load(mo_relaxed);                                                                                                                                                     
        while (!n.compare_exchange_strong(&expected, desired, mo_acquire));                                                                                                                  

  assert(n == 1);                                                                                                                                                                            

In words, two atomic variables are initialized as n = -1 and n2 = 0;

Thread 1 first writes 1 to n2 and then to n, provided n wasn't (still) -1.

Thread 2 first writes 0 to n, then loads n2 and assigns n = n2 as long as n wasn't changed since it last read n (or when n is still 0).

After both threads joined n must equal 1 in every possible execution.

This code is part of an open source project of me and related to resetting a streambuf implementation to the start of the buffer lock-free while two threads concurrently read and write to it. This particular part has to do with 'sync-ing' (or, flushing written output).

I designed this and it works when every operation is sequentially consistent (this was brute force tested), but I can't wrap my head around the memory order requirements :/.

2 Answers

Oliv On Best Solutions

This assertion could fire if instructions (and cache updates) are performed in this order:

  • The first threads run all its instructions. So it just change the value of n2 from 0 to 1.
  • Then threads 2 runs. First it changes the value of n from -1 to 0.
  • Then threads 2 loads n2 (in n2.load(mo_relaxed)). At this point there are no synchronizations so any value previously stored in n2 (including the initialization value, see [intro.race]/1) can be loaded. Let's say it loads 0.
  • So threads 2 variable values are n==0 (the last one in the modification order of n),n2==0, expected==0, desired==0 before the compare exchange instruction. Then, the compare exchange succeeds, and stores 0 in n.

At the end of the execution of the two threads you get n==0 and n2==1.

With sequential consistency what I described cannot happen because if thread 1 saw n2==1 && n==-1, thread 2 could not see n2==0 && n==0.

With this algorithm I am sure it is not possible to use any other memory order than sequential consistency.

user2107181 On

Using a tool that I found on https://plv.mpi-sws.org/rcmc/ I was able to experimentally find that the most relaxed requirements are:


n2.store(1, std::memory_order_seq_cst);
if (n.load(std::memory_order_seq_cst) != -1)
  n.store(1, std::memory_order_release);


n.store(0, std::memory_order_seq_cst);
int expected = 0;
int desired;
    desired = n2.load(std::memory_order_seq_cst);
while (!n.compare_exchange_strong(expected, desired,
    std::memory_order_acquire, std::memory_order_relaxed));


A more recent tool by the same authors (which is also better, of course) can now be downloaded from https://github.com/MPI-SWS/genmc

This tool turns out to be extremely fast and useful, even for testing Real-Life algorithms that use weakly ordered atomics, as for example I am doing here: genmc_buffer_reset_test.c

The #include's at the given line are generated C files, extracted from my C++ code and transformed to C with awk scripts, because it seems to me that genmc can (unfortunately) only be used for C code(?).