I saw this cpp core guideline and I'm trying to understand something:
Why is the first example marked as bad? Is it only because the variable is volatile? What harm can happen if the first check is not thread-safe assuming it's protected by the mutex? At the worst case we will stumble into a locked mutex but once it's unlocked we won't run the "only once" code.
Consider these 2 options:
class A{};
A() costly_init_function();
std::optional<A> a;
//option 1
std::once_flag a_init;
const A& foo() {
if (!a) {
std::call_once(a_init, [](){ a.emplace(costly_init_function()); });
}
return *a;
}
//option 2
std::atomic_bool a_init_flag = false;
std::mutex mutex;
const A& moo() {
if (!a_init_flag) {
std::lock_guard lock{mutex};
if (!a_init_flag) {
a.emplace(costly_init_function());
a_init_flag= true;
}
}
return *a;
}
Is there any actual issue that may happen in option 1? To me, it seems like the worst that could happen is that we access a in a not thread safe way and as a result we wait to the call_once to finish and then we simply skip to returning *a
.
Should I prefer option 2 which is more expensive but somewhat safer?
edit:
it seems people are thinking rephrasing my question and explaining it in more details is actually an answer. I would have just deleted the question but apparently I can't
so:
- yes I know the first
if (!a)
is not thread safe - yes I know the same results can be achieved by only calling
std::call_once
- yes I know what a
volatile
means
trying to be more specific at what I'm asking:
- I don't want this function to use a mutex or whatever std::call_once is doing in order to lock because it hurts my running time. (mutex - alot, call_once about 1/3 of the time the mutex takes but still 7 times more than "option 1"
- I first implemented the first option, knowing the first operation is not thread safe, assuming the worst that can happen is that the race condition is causing me to get into the call_once which has the actual protection and this only some running time will be wasted on the first few calls to the function.
- somebody pointed the linked cpp core guidline, telling me I should probably use an atomic bool in order to also make everything thread safe.
- this obviously has an additional running time cost but looking at the
bad
example it seemed to me that they are comparing apples to oranges because the used volatile. - I was unsure if
volatile
is essential to make the code "bad" or will using a regular storate variable also may cause an unwanted issue - came here to ask this, instead got a discussion of whether I should use a local static variable, as if this the actual code being run and not a simplified example
- BTW maybe the answer is simply 'every race condition is UB even if you can "proove" that the execution flow is the same as in this example'
Making the function thread-safe bears a cost. Yes, example 1 is faster, that is because it is not doing the required job.
There is a fundamental error in your thinking, that you can reason about concurrent accesses without atomics or syncing. The assumption is wrong. Any concurrent read and write from a non-atomic variable is UB.
Note that "happens before" has a precise definition. Simply running function A 5 minutes after function B does not validate as "B happens before A".
In particular, a possible outcome there, is another thread sees
a
as true before it sees the result of the init function.The problem is not the volatile, it's the lack of sequencing. This cppcoreguideline only highlights the fact that
volatile
does not offer sequencing guarantees, and as such exhibits the same undefined behavior as if thevolatile
keyword was not there.In short:
Eg:
Assuming at least one thread runs this function and you have no other synchronisation mechanism in place, a thread reading those can observe
y == 0
andx == 1
.On a technical level:
f
has no reason to flush the writes to memory, and if it does it has no reason to do so in any particular order.x
andy
has no reason to invalidate its cache and fetch the modified values from memory, and if it does it has no reason to do so in any particular order (eg: it might havey
in cache but notx
, so it will loadx
from memory and see the updated value, but use the cachedy
).