Why accessing pthread keys' sequence number is not synchronized in glibc's NPTL implementation?

256 views Asked by At

Recently when I look into how the thread-local storage is implemented in glibc, I found the following code, which implements the API pthread_key_create()

int
__pthread_key_create (key, destr)
      pthread_key_t *key;
      void (*destr) (void *);
{
    /* Find a slot in __pthread_kyes which is unused.  */
    for (size_t cnt = 0; cnt < PTHREAD_KEYS_MAX; ++cnt)
    {
        uintptr_t seq = __pthread_keys[cnt].seq;

        if (KEY_UNUSED (seq) && KEY_USABLE (seq)
            /* We found an unused slot.  Try to allocate it.  */
            && ! atomic_compare_and_exchange_bool_acq (&__pthread_keys[cnt].seq,
                                                       seq + 1, seq))
        {
            /* Remember the destructor.  */
            __pthread_keys[cnt].destr = destr;

            /* Return the key to the caller.  */
            *key = cnt;

            /* The call succeeded.  */
            return 0;
       }
    }

    return EAGAIN;
}

__pthread_keys is a global array accessed by all threads. I don't understand why the read of its member seq is not synchronized as in the following:

uintptr_t seq = __pthread_keys[cnt].seq;

although it is syncrhonized when modified later.

FYI, __pthread_keys is an array of type struct pthread_key_struct, which is defined as follows:

/* Thread-local data handling.  */
struct pthread_key_struct
{
    /* Sequence numbers.  Even numbers indicated vacant entries.  Note
       that zero is even.  We use uintptr_t to not require padding on
       32- and 64-bit machines.  On 64-bit machines it helps to avoid
       wrapping, too.  */
    uintptr_t seq;

    /* Destructor for the data.  */
    void (*destr) (void *);
};

Thanks in advance.

1

There are 1 answers

3
jspcal On

In this case, the loop can avoid an expensive lock acquisition. The atomic compare and swap operation done later (atomic_compare_and_exchange_bool_acq) will make sure only one thread can successfully increment the sequence value and return the key to the caller. Other threads reading the same value in the first step will keep looping since the CAS can only succeed for a single thread.

This works because the sequence value alternates between even (empty) and odd (occupied). Incrementing the value to odd prevents other threads from acquiring the slot.

Just reading the value is fewer cycles than the CAS instruction typically, so it makes sense to peek at the value, before doing the CAS.

There are many wait-free and lock-free algorithms that take advantage of the CAS instruction to achieve low-overhead synchronization.