I have a class of objects in a multithreaded application where each thread can mark an object for deletion, then a central garbage collector thread actually deletes the object. The threads communicate via member methods that access an internal bool:
class MyObjects {
...
bool shouldBeDeleted() const
{
return m_Delete;
}
void
markForDelete()
{
m_Delete = true;
}
...
std::atomic< bool > m_IsObsolete;
}
The bool has been made an atomic by someone else in the past because Thread Sanitizer kept complaining. However, perf suggests now that there is a processing overhead during the internal atomic load:
│ ↓ cbz x0, 3f4
│ _ZNKSt13__atomic_baseIbE4loadESt12memory_order():
│ {
│ memory_order __b = __m & __memory_order_mask;
│ __glibcxx_assert(__b != memory_order_release);
│ __glibcxx_assert(__b != memory_order_acq_rel);
│
│ return __atomic_load_n(&_M_i, __m);
│ add x0, x0, #0x40
86,96 │ ldarb w0, [x0]
Target platform is GCC, Aarch64 and Yocto Linux.
Now my questions are as follows:
Is atomic really needed in this case? The transition of the bool is one way (from false to true) with no way back while the object lives, so an inconsistency would merely mean that the object is deleted a little later, right?
Is there an alternative to
std::atomic<bool>
that will silence Thread Sanitizer but is computationally cheaper thanstd::atomic<bool>
?
An obvious modification could be to specify
memory_order_relaxed
to minimise memory barriers.See https://en.cppreference.com/w/cpp/atomic/memory_order
and https://bartoszmilewski.com/2008/12/01/c-atomics-and-memory-ordering/
Also see Herb Sutter's classic "Atomic Weapons" : https://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2
Caveat (see articles above) - if there are any co-dependencies on the object being flagged for deletion (e.g. another state variable, freeing resources etc) then you may need to use
memory_order_release
to ensure that thecan be deleted
flag setting occurs last and is not reordered by the compiler optimiser.Assuming the "garbage collector" is only checking the
can be deleted
flag alone it would not need to usememory_order_acquire
in the load; relaxed would be sufficient. Otherwise it would need to use acquire to guarantee that any co-dependent accesses are not reordered to occur before reading the flag.