Take this simple function that increments an integer under a lock implemented by std::mutex
:
#include <mutex>
std::mutex m;
void inc(int& i) {
std::unique_lock<std::mutex> lock(m);
i++;
}
I would expect this (after inlining) to compile in a straightforward way to a call of m.lock()
an increment of i
and then m.unlock()
.
Checking the generated assembly for recent versions of gcc
and clang
, however, we see an extra complication. Taking the gcc
version first:
inc(int&):
mov eax, OFFSET FLAT:__gthrw___pthread_key_create(unsigned int*, void (*)(void*))
test rax, rax
je .L2
push rbx
mov rbx, rdi
mov edi, OFFSET FLAT:m
call __gthrw_pthread_mutex_lock(pthread_mutex_t*)
test eax, eax
jne .L10
add DWORD PTR [rbx], 1
mov edi, OFFSET FLAT:m
pop rbx
jmp __gthrw_pthread_mutex_unlock(pthread_mutex_t*)
.L2:
add DWORD PTR [rdi], 1
ret
.L10:
mov edi, eax
call std::__throw_system_error(int)
It's the first couple of lines that are interesting. The assembled code examines the address of __gthrw___pthread_key_create
(which is the implementation for pthread_key_create
- a function to create a thread-local storage key), and if it is zero, it branches to .L2
which implements the increment in a single instruction without any locking at all.
If it is non-zero it proceeds as expected: locking the mutex, doing the increment, and unlocking.
clang
does even more: it checks the address of the function twice, once before the lock
and once before the unlock
:
inc(int&): # @inc(int&)
push rbx
mov rbx, rdi
mov eax, __pthread_key_create
test rax, rax
je .LBB0_4
mov edi, m
call pthread_mutex_lock
test eax, eax
jne .LBB0_6
inc dword ptr [rbx]
mov eax, __pthread_key_create
test rax, rax
je .LBB0_5
mov edi, m
pop rbx
jmp pthread_mutex_unlock # TAILCALL
.LBB0_4:
inc dword ptr [rbx]
.LBB0_5:
pop rbx
ret
.LBB0_6:
mov edi, eax
call std::__throw_system_error(int)
What's the purpose of this check?
Perhaps it is to support the case where the object file is ultimately complied into a binary without pthreads support and then to fall back to a version without locking in that case? I couldn't find any documentation on this behavior.
Your guess looks to be correct. From the
libgcc/gthr-posix.h
file in gcc's source repository (https://github.com/gcc-mirror/gcc.git):Then throughout the remainder of the file many of the pthread APIs are wrapped inside checks to the
__gthread_active_p()
function. If__gthread_active_p()
returns 0 nothing is done and success is returned.