volatile variable and atomic operations on Visual C++ x86

3.5k views Asked by At

Plain load has acquire semantics on x86, plain store has release semantics, however compiler still can reorder instructions. While fences and locked instructions (locked xchg, locked cmpxchg) prevent both hardware and compiler from reordering, plain loads and stores are still necessary to protect with compiler barriers. Visual C++ provides _ReadWriterBarrier() function, which prevents compiler from reordering, also C++ provides volatile keyword for the same reason. I write all this information just to make sure that I get everything right. So all written above is true, is there any reason to mark as volatile variables which are going to be used in functions protected with _ReadWriteBarrier()?

For example:

int load(int& var)
{
    _ReadWriteBarrier();
    T value = var;
    _ReadWriteBarrier();
    return value;
}

Is it safe to make that variable non-volatile? As far as I understand it is, because function is protected and no reordering could be done by compiler inside. On the other hand Visual C++ provides special behavior for volatile variables (different from the one that standard does), it makes volatile reads and writes atomic loads and stores, but my target is x86 and plain loads and stores are supposed to be atomic on x86 anyway, right?

Thanks in advance.

2

There are 2 answers

5
abRao On BEST ANSWER

Volatile keyword is available in C too. "volatile" is often used in embedded System, especially when value of the variable may change at any time-without any action being taken by the code - three common scenarios include reading from a memory-mapped peripheral register or global variables either modified by an interrupt service routine or those within a multi-threaded program.

So it is the last scenario where volatile could be considered to be similar to _ReadWriteBarrier.

_ReadWriteBarrier is not a function - _ReadWriteBarrier does not insert any additional instructions, and it does not prevent the CPU from rearranging reads and writes— it only prevents the compiler from rearranging them. _ReadWriteBarrier is to prevent compiler reordering.

MemoryBarrier is to prevent CPU reordering!

A compiler typically rearranges instructions... C++ does not contain built-in support for multithreaded programs so the compiler assumes the code is single-threaded when reordering the code. With MSVC use ­_ReadWriteBarrier in the code, so that the compiler will not move reads and writes across it.

Check this link for more detailed discussion on those topics http://msdn.microsoft.com/en-us/library/ee418650(v=vs.85).aspx

Regarding your code snippet - you do not have to use ReadWriteBarrier as a SYNC primitive - the first call to _ReadWriteBarrier is not necessary.

When using ReadWriteBarrier you do not have to use volatile

You wrote "it makes volatile reads and writes atomic loads and stores" - I don't think that is OK to say that, Atomicity and volatility are different. Atomic operations are considered to be indivisible - ... http://www.yoda.arachsys.com/csharp/threads/volatility.shtml

11
peterchen On

Note: I am not an expert on this topic, some of my statements are "what I heard on the internet", but I think I csan still clear up some misconceptions.

[edit] In general, I would rely on platform-specifics such as x86 atomic reads and lack of OOOX only in isolated, local optimizations that are guarded by an #ifdef checking the target platform, ideally accompanied by a portable solution in the #else path.

Things to look out for

  • atomicity of read / write operations
  • reordering due to compiler optimizations (this includes a different order seen by another thread due to simple register caching)
  • out-of-order execution in the CPU

Possible misconceptions

1. As far as I understand it is, because function is protected and no reordering could be done by compiler inside.
[edit] To clarify: the _ReadWriteBarrier provides protection against instruction reordering, however, you have to look beyond the scope of the function. _ReadWriteBarrier has been fixed in VS 2010 to do that, earlier versions may be broken (depending on the optimizations they actually do).

Optimization isn't limited to functions. There are multiple mechanisms (automatic inlining, link time code generation) that span functions and even compilation units (and can provide much more significant optimizations than small-scoped register caching).

2. Visual C++ [...] makes volatile reads and writes atomic loads and stores,
Where did you find that? MSDN says that beyond the standard, will put memory barriers around reads and writes, no guarantee for atomic reads.

[edit] Note that C#, Java, Delphi etc. have different memory mdoels and may make different guarantees.

3. plain loads and stores are supposed to be atomic on x86 anyway, right?
No, they are not. Unaligned reads are not atomic. They happen to be atomic if they are well-aligned - a fact I'd not rely on unless it's isolated and easily exchanged. Otherwise your "simplificaiton fo x86" becomes a lockdown to that target.

[edit] Unaligned reads happen:

char * c = new char[sizeof(int)+1];
load(*(int *)c);      // allowed by standard to be unaligned
load(*(int *)(c+1));  // unaligned with most allocators

#pragma pack(push,1)
struct 
{
   char c;
   int  i;
} foo;
load(foo.i);         // caller said so
#pragma pack(pop)

This is of course all academic if you remember the parameter must be aligned, and you control all code. I wouldn't write such code anymore, because I've been bitten to often by laziness of the past.

4. Plain load has acquire semantics on x86, plain store has release semantics
No. x86 processors do not use out-of-order execution (or rather, no visible OOOX - I think), but this doesn't stop the optimizer from reordering instructions.

5. _ReadBarrier / _WriteBarrier / _ReadWriteBarrier do all the magic They don't - they just prevent reordering by the optimizer. MSDN finally made it a big bad warning for VS2010, but the information apparently applies to previous versions as well.


Now, to your question.

I assume the purpose of the snippet is to pass any variable N, and load it (atomically?) The straightforward choice would be an interlocked read or (on Visual C++ 2005 and later) a volatile read.

Otherwise you'd need a barrier for both compiler and CPU before the read, in VC++ parlor this would be:

int load(int& var)
{   
  // force Optimizer to complete all memory writes:
  // (Note that this had issues before VC++ 2010)
   _WriteBarrier();    

  // force CPU to settle all pending read/writes, and not to start new ones:
   MemoryBarrier();

   // now, read.
   int value = var;    
   return value;
}

Noe that _WriteBarrier has a second warning in MSDN: *In past versions of the Visual C++ compiler, the _ReadWriteBarrier and _WriteBarrier functions were enforced only locally and did not affect functions up the call tree. These functions are now enforced all the way up the call tree.*


I hope that is correct. stackoverflowers, please correct me if I'm wrong.