#include <pthread.h>
#include <thread>
int main(){
pthread_mutex_t mut;
if (pthread_mutex_init(&mut, NULL) != 0){
return 0;
}
int a = 0;
auto t1 = std::thread([&](){
for(int c = 0; c<=10000;c++){
pthread_mutex_lock(&mut);
a += 1; //#1 suppose this non-atomic object is protected by the mutex
pthread_mutex_unlock(&mut);
}
});
auto t2 = std::thread([&](){
int r = 0;
for(;;){
pthread_mutex_lock(&mut);
r = a;
std::cout<< a <<std::endl; //#2 suppose this non-atomic object is protected by the mutex
pthread_mutex_unlock(&mut);
if(r>=10000){
return;
}
}
});
t1.join();
t2.join();
}
According to [intro.races] p10
An evaluation A happens before an evaluation B (or, equivalently, B happens after A) if:
- [...]
- A inter-thread happens before B.
An evaluation A inter-thread happens before an evaluation B if
- A synchronizes with B, or
- [...]
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
does this code use the feature guaranteed by the implementation but it is UB in C++ standard?
UPDATE:
Previous content cannot reveal why this code is UB. The updated part gives my explanation: First, #1 is a modification to a non-atomic object, and #2 is the read to this object, and they occur in different threads, they do not have data race if and only if one operation happens before the other, in short, it means the operations pthread_mutex_unlock(&mut) in t1 must synchronize with pthread_mutex_lock(&mut) in t2 such that #1 happens before #2, or vice versa.
However, the C++ standard does not say the operation pthread_mutex_unlock(&mut) can synchronize with the operation pthread_mutex_lock(&mut), hence [intro.races] p21 is violated, which result in UB.
Claim
I don't know why this question is marked as a duplication of "Unspecified, Undefined vs. Implementation-defined". First, by looking at the comments, this question has a debate point of whether the code is undefined. Second, this question is not about the difference between UB, unspecified, and implementation-defined behavior.
It's not UB. Concurrent access to
int adoesn't actually happen when the program is executed on any real system with a working implementation of pthreads, because pthread functions prevent that.They have well-defined externally-visible behaviour, defined by standards other than ISO C++, specifically POSIX. On all non-buggy implementations that provide these functions, the behaviour is well-defined.
Without declarations/definitions, the program would be ill-formed, diagnostic required, not UB1.
The key concept here is that when ISO C++ says "otherwise, the behaviour is undefined", that just means that standard doesn't define it. It doesn't forbid real C++ implementations from defining the behaviour for more cases1,2.
The existence in the standard of a few ways to create synchronization (including
std::mutex,std::atomic, andstd::threadcreation and .join) also doesn't forbid implementations from defining other ways to create synchronization, such as hand-written asm and/or system calls. And/or define the behaviour of data races onvolatileobjects and provide fences against compile-time and run-time reordering that work with that.C++ is intended to be extensible with platform-specific functions or even language extensions. The standard explicitly says so: [intro.compliance.general] /8; the only restriction is that extensions can't break a program the ISO C++ standard says is well-formed.
If we're talking about this program on a system that doesn't have pthreads at all (because it only provides things that the ISO C++ standard says it has to), the program is "ill-formed" and can't run; it won't even reach the point of having data-race UB. That's obviously uninteresting; I'm discussing the case of systems that have correct pthreads implementations. (Also not just stubs with those names that return without doing anything.)
In terms of the C++ abstract machine where
std::mutexandstd::atomicare primitive operations, you could think of pthreads as a third-party library of opaque functions that work as if they usedstd::mutexand/orstd::atomicinternally to implement functions likepthread_mutex_lock. You could actually write functions with similar names and call them this way to produce a thread-safe program, e.g. withpthread_mutex_t = std::mutexandpthread_mutex_lockasm->lock().So the pthreads library's externally-visible behaviour isn't "special"; it doesn't do anything you couldn't do in pure ISO C++. (At least not in the simplest mutex function you're showing here.)
If you wrote your own library that creates synchronization between threads (externally having opaque types but internally allocating arrays of
std::atomicfor example), it wouldn't be weird to document the synchronization without mentioning the details of the internal implementation.The C++ standard doesn't say your functions called
fooandbarcan synchronize either, or that the only libraries are the ones documented in the standard! As mentioned above, ISO C++ explicitly says implementations can provide more library functions.In actual reality for implementations that have pthreads, those functions are the primitive operations, the building blocks for
std::mutexandstd::thread. And they often use implementation-specific stuff likeasmstatements or hand-written asm functions. There is formal documentation for how such extensions work in many implementations, although you might have a hard time finding formal-enough documentation of every step to prove things correct all the way down. e.g. the way separately-written asm functions interact with C and C++ compiler-generated code is so well-known by compiler devs that it's maybe not all formally specified in terms of memory-model stuff, so there might be some level of "of course it works" if you want to drill down to the internal implementation of a specific libpthread.std::atomicis implemented in terms of first-class language features in mainstream compilers, for example GCC and Clang have builtin functions like__atomic_load_nand__atomic_exchange_nwhich compilers know how to inline and understand how they can optimize memory accesses before/after an atomic operation depending on the memory_order parameter.pthread_mutexstuff could be implemented using mostlystd::atomic, but real implementations of pthreads predate C++11 / C11.For that and other reasons, they often have functions hand-written in asm, the same way
strlen,memcmpand many other standard library functions are on real implementations. (They implement behaviour as if they were written in C or C++; I don't think the GCC manual explicitly spells out the language-lawyer aspect of calling a function written in asm, because it's so far from thinking in terms of the abstract machine. GNU C / C++ does define rules for things likeasm("" ::: "memory")clobbers which are usable as a compiler memory barrier likestd::atomic_signal_fence(seq_cst), but the actual definition is in terms of a more asm-based memory model of variables in memory having values that match what the abstract machine says they should.)In terms of implementing the required ordering guarantees, a hand-written asm function is fully opaque to the optimizer, and has to be assumed to do anything like
mutex->lock(), atomic seq_cst load, store,atomic_thread_fence(seq_cst), and read + write all globally-reachable objects.See How does a mutex lock and unlock functions prevents CPU reordering? for why being non-inline is sufficient. If you defined
pthread_mutex_lockin a header so it could inline, you'd write it with sufficient ordering baked in using__atomic/std::atomic, or GNU C inlineasmstatements, using low-level implementation-defined behaviour to achieve the necessary high-level ordering compatible with how this implementation treats atomics and memory ordering in general.The internal implementation for any given target ISA using inline or stand-alone asm will creates synchronization on that ISA according to the hardware memory-order rules, perhaps using asm fence instructions or acquire loads / release stores, to be at least as strong as what ISO C++ requires. Perhaps stronger if the ISA doesn't have a way to do just
acquireRMWs andreleasestores, like x86 where any atomic RMW has to be a full barrier (lock cmpxchgfor example). Combined with the rules the implementation defines for asm, this can create high-level ordering.Footnote 1: a missing definition isn't UB, it's an ill-formed program
http://eel.is/c++draft/intro.compliance#general-1
http://eel.is/c++draft/basic.lookup#general-1
So a warning or error is required for a call to an undefined function. I guess if an implementation wanted to, it could "try" to run an ill-formed program anyway after warning about it, with no guarantees about behaviour? But then we're outside of C++ land. To silently break this code, an implementation would have to provide a definition of
pthread_mutex_lockwhich didn't actually do locking.(Or a
pthread_mutex_lockthat's incompatible with the cores thatstd::threadruns the program across? It's only guaranteed by POSIX to work for threads started bypthread_create, and I guess it's hypothetically possible to have a system where one thread-creation method runs threads across a set of cores that requires one flavour of asm instructions for correct locking, which doesn't work for threads created with a different interface.)http://eel.is/c++draft/intro.defs#defns.undefined - definition of UB
So it's not UB to use these functions on a system that doesn't have them at all, it's just an ill-formed program.
Also note how the definition explicitly grants implementations permission to define whatever behaviour they want to define, which ISO C++ leaves undefined, as in the next footnote. In a documented manner implies that this includes things suitable for production use on those implementations.
Footnote 2: defining more behaviour
For example, GCC language extensions such as
typedef uint32_t aliasing_u32 __attribute__((aligned(1),may_alias))let you deref analiasing_u32*pointing at any object without regard for the normal aliasing rules, the same way you can withunsigned char. Also without regard for the normal alignment rules.Another example is compiling with
g++ -fwrapv.g++ -fwrapvis a C++ implementation that defines the behaviour of signed integer overflow (as 2's complement wraparound). Not just as the result you happen to get from UB, but truly well-defined so the optimizer has to respect it. And-fwrapv -fsanitize=undefinedwon't check for signed-integer overflow.Such code would or could encounter undefined behaviour if compiled with an implementation that only defined the bare minimum that ISO C++ requires it to define, but not on the implementations they're written for.
ISO C++ overrule the ability of an implementation to define behaviour of various operations, including inline asm, separately-compiled asm, and language extensions like GNU C
__atomic.