Release-Consume ordering for reference counting

1.4k views Asked by At

Consider the following simple reference counting functions (to be used with boost::intrusive_ptr):

class Foo {
    // ...

    std::atomic<std::size_t> refCount_{0};

    friend void intrusive_ptr_add_ref(Foo* ptr)
    {
        ++ptr->refCount_;  // ❶
    }

    friend void intrusive_ptr_release(Foo* ptr)
    {
        if (--ptr->refCount_ == 0) {  // ❷
            delete ptr;
        }
    }
};

I'm still learning memory ordering, and I'm wondering if the default memory ordering for fetch_add/sub (memory_order_seq_cst) is too strict in this case. Since the only ordering I want to ensure is between the ❶ and ❷, I think we can replace ❶ with

ptr->refCount_.fetch_add(1, std::memory_order_release);

and ❷ with

if (ptr->refCount_.fetch_sub(1, std::memory_order_consume) == 1) {

But memory ordering is still new and subtle to me, so I'm not sure if this will work correctly. Did I miss anything?

3

There are 3 answers

7
Potatoswatter On BEST ANSWER

Consulting the libc++ implementation of std::shared_ptr, you might want memory_order_relaxed for increment and memory_order_acq_rel for the decrement. Rationalizing this usage:

If the number increases, then all that matters is its consistency. The current thread is already sure that it's greater than zero. Other threads are unsynchronized so they will see the update at an indeterminate time before the next atomic modification, and perhaps at a time inconsistent with updates of other variables.

If the number decreases, then you need to be sure that the current thread has already finished modifying it. All updates from other threads must be visible. The current decrement must be visible to the next one. Otherwise, if the counter raced ahead of the object it was guarding, the object could be destroyed prematurely.

Cppreference has a nice page on memory ordering. It includes this note:

The specification of release-consume ordering is being revised, and the use of memory_order_consume is temporarily discouraged.

It also insinuates that no current compilers or CPUs implement consume; it's effectively the same as acquire.

2
Simon Richter On

Incrementing the reference count does not require any synchronization in a correct program, just atomicity.

We pretend that references are owned by threads. A thread may only use a referenced object if the reference counter is at least one, and is guaranteed not to drop to zero while the object is being used, which either means that the thread has incremented the reference counter during its use of the object, or there is another mechanism that ensures this condition is met.

Thus, we assume that the thread incrementing the reference count owns the reference that ensures that it may access the object's reference counter, so no other thread may decrement the reference counter to zero while it is trying to increment the counter. The only thread allowed to drop the initial reference is either the current thread (after incrementing the reference count), or another thread once the current thread has signaled that its shared use of the object (i.e. the "ownership" of the original reference) has ceased -- both of these are visible effects.

On the other hand, decrementing the reference counter requires acquire and release semantics, as the object may be destroyed afterwards.

The CPP Reference's page on std::memory_order says

Typical use for relaxed memory ordering is incrementing counters, such as the reference counters of std::shared_ptr, since this only requires atomicity, but not ordering or synchronization (note that decrementing the shared_ptr counters requires acquire-release synchronization with the destructor).

1
curiousguy On
if (ptr->refCount_.fetch_sub(1, std::memory_order_consume) == 1) {

That one is clearly wrong: a use of std::memory_order_consume only makes sense but a value that can be consume in a side effect is later done, like that:

r = a.fetch_sub(1, std::memory_order_consume);
v[r] = 1; // depends on r
x = r+2; // depends on r
z = r-r; // this is a legal dependency but many compilers mis-compile it
y = r/r; // still a legal dependency
f (r); // where f is declared with carry dependency 
r2 = r; // transfers the dependency
v[r2] = 2; // dependency ordered
r2 = r ? 1 : 0; // breaks the dependency
v[r2] = 2; // not dependency ordered

Consume is zero cost ordering on most CPU, as the CPU guarantees correct ordering simply by virtue of the result depending on the value. It assumes that assembly code mirrors C++ code exactly which compilers cannot in general guarantee; consider these:

int a1[1];
a[r] = 2; // r is 0 or out of bound

Here the compiler can assume that r is 0: there is exactly one element, so the code does the same as:

a[0] = 2;

but that one doesn't have a dependency on r, so in context of a consume ordering, the assembly code must use r for addressing.

It means that compilers must be able to selectively disable many common, simple optimizations that are done in another part of the compiler, a part that is language independent.

These are many obvious cases that must not be optimized, there are must elaborate cases like a pointer to a singleton.