Cheaper alternative to std::atomic<bool>?

721 views Asked by At

I have a class of objects in a multithreaded application where each thread can mark an object for deletion, then a central garbage collector thread actually deletes the object. The threads communicate via member methods that access an internal bool:

class MyObjects {
...   
bool shouldBeDeleted() const
{
   return m_Delete;
}

void
markForDelete()
{
   m_Delete = true;
}
...
   std::atomic< bool >                                        m_IsObsolete;
}

The bool has been made an atomic by someone else in the past because Thread Sanitizer kept complaining. However, perf suggests now that there is a processing overhead during the internal atomic load:

   │     ↓ cbz    x0, 3f4                                                                                                                                                                                                                                                                                                                                                                                            

   │     _ZNKSt13__atomic_baseIbE4loadESt12memory_order():                                                                                                                                                                                                                                                                                                                                                           

   │           {                                                                                                                                                                                                                                                                                                                                                                                                     

   │             memory_order __b = __m & __memory_order_mask;                                                                                                                                                                                                                                                                                                                                                       

   │             __glibcxx_assert(__b != memory_order_release);                                                                                                                                                                                                                                                                                                                                                      

   │             __glibcxx_assert(__b != memory_order_acq_rel);                                                                                                                                                                                                                                                                                                                                                      

   │                                                                                                                                                                                                                                                                                                                                                                                                                 

   │             return __atomic_load_n(&_M_i, __m);                                                                                                                                                                                                                                                                                                                                                                 

   │       add    x0, x0, #0x40                                                                                                                                                                                                                                                                                                                                                                                          

 86,96 │       ldarb  w0, [x0]  

Target platform is GCC, Aarch64 and Yocto Linux.

Now my questions are as follows:

  • Is atomic really needed in this case? The transition of the bool is one way (from false to true) with no way back while the object lives, so an inconsistency would merely mean that the object is deleted a little later, right?

  • Is there an alternative to std::atomic<bool> that will silence Thread Sanitizer but is computationally cheaper than std::atomic<bool>?

2

There are 2 answers

7
Den-Jason On

An obvious modification could be to specify memory_order_relaxed to minimise memory barriers.

See https://en.cppreference.com/w/cpp/atomic/memory_order

and https://bartoszmilewski.com/2008/12/01/c-atomics-and-memory-ordering/

Also see Herb Sutter's classic "Atomic Weapons" : https://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2

m_Delete.store (true, std::memory_order_relaxed);

Caveat (see articles above) - if there are any co-dependencies on the object being flagged for deletion (e.g. another state variable, freeing resources etc) then you may need to use memory_order_release to ensure that the can be deleted flag setting occurs last and is not reordered by the compiler optimiser.

Assuming the "garbage collector" is only checking the can be deleted flag alone it would not need to use memory_order_acquire in the load; relaxed would be sufficient. Otherwise it would need to use acquire to guarantee that any co-dependent accesses are not reordered to occur before reading the flag.

4
davidbak On

The problem (as clarified in a comment by the OP) is not a true GC but is instead delayed deletion of objects on a separate thread so as to unburden the main processing threads from the time it takes to to the delete. All objects to be deleted are marked such at some time - at a later time the deletion thread comes along and deletes them.

Consider first: Is it really the case that delayed deletion is necessary in order to meet the program's performance goals - specifically, latency? It might just be extra overhead that actually impacts latency. (Or perhaps there are also different performance goals, e.g., throughput, to be considered.) Delayed deletion isn't an obvious performance win in all cases - you need to find out if it is appropriate in each case. (For example, it might not even be necessary for all deletions: perhaps some deletions can be done immediately in-line without impacting performance while others need to be deferred. This could be because, for example, different processing threads are doing different things with different latency/throughput requirements.)

Now to a solution: Since we're talking deferred deletion - there is no reason the deletion thread needs to scan all objects looking for the ones to delete (each time it does a full scan). Instead, pay a slightly larger cost at the time you mark the object for deletion and pay no cost to scan all objects. Do this by linking deleted objects onto a deletion work list. There is a synchronization cost there (which can be minimized in various ways besides obvious locks) but it is paid once per object not once per object per scan.

(Doesn't have to be a linked list either. If there is an upper bound to how many objects can be deleted in a period of time you can just use an appropriate array.)

There are other possibilities that are opened up by characterizing this problem more precisely as "deferred deletion" rather than "garbage collection": some constraints are lifted (perhaps others are added).