Double checked locking implementation - using atomic vs not using

592 views Asked by At

I saw this cpp core guideline and I'm trying to understand something:

http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#cp111-use-a-conventional-pattern-if-you-really-need-double-checked-locking

Why is the first example marked as bad? Is it only because the variable is volatile? What harm can happen if the first check is not thread-safe assuming it's protected by the mutex? At the worst case we will stumble into a locked mutex but once it's unlocked we won't run the "only once" code.

Consider these 2 options:

class A{};
A() costly_init_function();
std::optional<A> a;

//option 1   
std::once_flag a_init;
const A& foo() {
    if (!a) {
      std::call_once(a_init, [](){ a.emplace(costly_init_function()); });
    }
    return *a;
}

//option 2
 std::atomic_bool a_init_flag = false;
 std::mutex mutex;
 const A& moo() {
  if (!a_init_flag) {
    std::lock_guard lock{mutex};
    if (!a_init_flag) {
      a.emplace(costly_init_function());
      a_init_flag= true;
     }
  }
  return *a;
 }

Is there any actual issue that may happen in option 1? To me, it seems like the worst that could happen is that we access a in a not thread safe way and as a result we wait to the call_once to finish and then we simply skip to returning *a. Should I prefer option 2 which is more expensive but somewhat safer?

edit:

it seems people are thinking rephrasing my question and explaining it in more details is actually an answer. I would have just deleted the question but apparently I can't

so:

  1. yes I know the first if (!a) is not thread safe
  2. yes I know the same results can be achieved by only calling std::call_once
  3. yes I know what a volatile means

trying to be more specific at what I'm asking:

  1. I don't want this function to use a mutex or whatever std::call_once is doing in order to lock because it hurts my running time. (mutex - alot, call_once about 1/3 of the time the mutex takes but still 7 times more than "option 1"
  2. I first implemented the first option, knowing the first operation is not thread safe, assuming the worst that can happen is that the race condition is causing me to get into the call_once which has the actual protection and this only some running time will be wasted on the first few calls to the function.
  3. somebody pointed the linked cpp core guidline, telling me I should probably use an atomic bool in order to also make everything thread safe.
  4. this obviously has an additional running time cost but looking at the bad example it seemed to me that they are comparing apples to oranges because the used volatile.
  5. I was unsure if volatile is essential to make the code "bad" or will using a regular storate variable also may cause an unwanted issue
  6. came here to ask this, instead got a discussion of whether I should use a local static variable, as if this the actual code being run and not a simplified example
  7. BTW maybe the answer is simply 'every race condition is UB even if you can "proove" that the execution flow is the same as in this example'
3

There are 3 answers

0
spectras On BEST ANSWER

Making the function thread-safe bears a cost. Yes, example 1 is faster, that is because it is not doing the required job.

There is a fundamental error in your thinking, that you can reason about concurrent accesses without atomics or syncing. The assumption is wrong. Any concurrent read and write from a non-atomic variable is UB.

intro.races

The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.

Note that "happens before" has a precise definition. Simply running function A 5 minutes after function B does not validate as "B happens before A".

In particular, a possible outcome there, is another thread sees a as true before it sees the result of the init function.

The problem is not the volatile, it's the lack of sequencing. This cppcoreguideline only highlights the fact that volatile does not offer sequencing guarantees, and as such exhibits the same undefined behavior as if the volatile keyword was not there.

In short:

  • With no synchronisation, there is 0 guarantee on anything.

Eg:

int x = 0;
int y = 0;

void f() {
    y = 1;
    x = 1;
}

Assuming at least one thread runs this function and you have no other synchronisation mechanism in place, a thread reading those can observe y == 0 and x == 1.

On a technical level:

  • the thread that calls f has no reason to flush the writes to memory, and if it does it has no reason to do so in any particular order.
  • the thread that reads x and y has no reason to invalidate its cache and fetch the modified values from memory, and if it does it has no reason to do so in any particular order (eg: it might have y in cache but not x, so it will load x from memory and see the updated value, but use the cached y).
2
ajm On

volatile tells you about the memory's behavior. A great example of volatile memory is memory-mapped IO. It means, roughly, that the underlying memory value can change, for unpredictable reasons, like being mapped to hardware. It is a separate issue from race conditions.

In your examples, for the purpose of calling costly_init_function exactly once, the first option should suffice, and the if (!a) check is unnecessary.

The purpose of double-checking around mutexes is to acquire the mutex lock as few times as necessary. Having another thread wait on a lock and then acquire it, only to fail a condition and do nothing, is unnecessarily expensive. That example wouldn't result in the critical area being called twice, but it would be less performant.

2
Chris Dodd On

The problem with 1 is that std::optional<A> not atomic, so a second thread might try to test !a when the first thread is calling a.emplace, causing a race condition and undefined behavior.

If you get rid of the if(!a) test and just rely on std::call_once's implicit mutex, it should be fine.

IMO it would be better to make both a and a_init static locals of foo rather than global vars (avoid polluting the global namespace). You also don't need std::optional (can use just A a;) unless A is a non-assignable type and/or does not have a default ctor.

If you want to avoid the call_once mutex, you could do something with a std::atomic<A*>. You might try std::atomic<std::optional<A>> but that probably introduces another mutex.