Is the use of platform-provided synchronous operations a kind of UB?

231 views Asked by At
#include <pthread.h>     
#include <thread>
int main(){
  pthread_mutex_t mut;
  if (pthread_mutex_init(&mut, NULL) != 0){
     return 0;
  }
  int a = 0; 
  auto t1 = std::thread([&](){
     for(int c = 0; c<=10000;c++){
        pthread_mutex_lock(&mut);
        a += 1;  //#1 suppose this non-atomic object is protected by the mutex
        pthread_mutex_unlock(&mut);
     }
  });
  auto t2 = std::thread([&](){
     int r = 0;
     for(;;){
        pthread_mutex_lock(&mut);
        r = a;
        std::cout<< a <<std::endl;  //#2 suppose this non-atomic object is protected by the mutex
        pthread_mutex_unlock(&mut);
        if(r>=10000){
           return;
        }
     }
  });
  t1.join();
  t2.join();
}

According to [intro.races] p10

An evaluation A happens before an evaluation B (or, equivalently, B happens after A) if:

  • [...]
  • A inter-thread happens before B.

[intro.races] p9

An evaluation A inter-thread happens before an evaluation B if

  • A synchronizes with B, or
  • [...]

and [intro.races] p21

The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.

does this code use the feature guaranteed by the implementation but it is UB in C++ standard?

UPDATE:

Previous content cannot reveal why this code is UB. The updated part gives my explanation: First, #1 is a modification to a non-atomic object, and #2 is the read to this object, and they occur in different threads, they do not have data race if and only if one operation happens before the other, in short, it means the operations pthread_mutex_unlock(&mut) in t1 must synchronize with pthread_mutex_lock(&mut) in t2 such that #1 happens before #2, or vice versa.

However, the C++ standard does not say the operation pthread_mutex_unlock(&mut) can synchronize with the operation pthread_mutex_lock(&mut), hence [intro.races] p21 is violated, which result in UB.

Claim

I don't know why this question is marked as a duplication of "Unspecified, Undefined vs. Implementation-defined". First, by looking at the comments, this question has a debate point of whether the code is undefined. Second, this question is not about the difference between UB, unspecified, and implementation-defined behavior.

3

There are 3 answers

5
Peter Cordes On

It's not UB. Concurrent access to int a doesn't actually happen when the program is executed on any real system with a working implementation of pthreads, because pthread functions prevent that.
They have well-defined externally-visible behaviour, defined by standards other than ISO C++, specifically POSIX. On all non-buggy implementations that provide these functions, the behaviour is well-defined.

Without declarations/definitions, the program would be ill-formed, diagnostic required, not UB1.

The key concept here is that when ISO C++ says "otherwise, the behaviour is undefined", that just means that standard doesn't define it. It doesn't forbid real C++ implementations from defining the behaviour for more cases1,2.

The existence in the standard of a few ways to create synchronization (including std::mutex, std::atomic, and std::thread creation and .join) also doesn't forbid implementations from defining other ways to create synchronization, such as hand-written asm and/or system calls. And/or define the behaviour of data races on volatile objects and provide fences against compile-time and run-time reordering that work with that.

C++ is intended to be extensible with platform-specific functions or even language extensions. The standard explicitly says so: [intro.compliance.general] /8; the only restriction is that extensions can't break a program the ISO C++ standard says is well-formed.


If we're talking about this program on a system that doesn't have pthreads at all (because it only provides things that the ISO C++ standard says it has to), the program is "ill-formed" and can't run; it won't even reach the point of having data-race UB. That's obviously uninteresting; I'm discussing the case of systems that have correct pthreads implementations. (Also not just stubs with those names that return without doing anything.)


In terms of the C++ abstract machine where std::mutex and std::atomic are primitive operations, you could think of pthreads as a third-party library of opaque functions that work as if they used std::mutex and/or std::atomic internally to implement functions like pthread_mutex_lock. You could actually write functions with similar names and call them this way to produce a thread-safe program, e.g. with pthread_mutex_t = std::mutex and pthread_mutex_lock as m->lock().

So the pthreads library's externally-visible behaviour isn't "special"; it doesn't do anything you couldn't do in pure ISO C++. (At least not in the simplest mutex function you're showing here.)

However, the C++ standard does not say the operation pthread_mutex_unlock(&mut) can synchronize with the operation pthread_mutex_lock(&mut), hence [intro.races] p21 is violated, which result in UB.

If you wrote your own library that creates synchronization between threads (externally having opaque types but internally allocating arrays of std::atomic for example), it wouldn't be weird to document the synchronization without mentioning the details of the internal implementation.

The C++ standard doesn't say your functions called foo and bar can synchronize either, or that the only libraries are the ones documented in the standard! As mentioned above, ISO C++ explicitly says implementations can provide more library functions.


In actual reality for implementations that have pthreads, those functions are the primitive operations, the building blocks for std::mutex and std::thread. And they often use implementation-specific stuff like asm statements or hand-written asm functions. There is formal documentation for how such extensions work in many implementations, although you might have a hard time finding formal-enough documentation of every step to prove things correct all the way down. e.g. the way separately-written asm functions interact with C and C++ compiler-generated code is so well-known by compiler devs that it's maybe not all formally specified in terms of memory-model stuff, so there might be some level of "of course it works" if you want to drill down to the internal implementation of a specific libpthread.

std::atomic is implemented in terms of first-class language features in mainstream compilers, for example GCC and Clang have builtin functions like __atomic_load_n and __atomic_exchange_n which compilers know how to inline and understand how they can optimize memory accesses before/after an atomic operation depending on the memory_order parameter.

pthread_mutex stuff could be implemented using mostly std::atomic, but real implementations of pthreads predate C++11 / C11.

For that and other reasons, they often have functions hand-written in asm, the same way strlen, memcmp and many other standard library functions are on real implementations. (They implement behaviour as if they were written in C or C++; I don't think the GCC manual explicitly spells out the language-lawyer aspect of calling a function written in asm, because it's so far from thinking in terms of the abstract machine. GNU C / C++ does define rules for things like asm("" ::: "memory") clobbers which are usable as a compiler memory barrier like std::atomic_signal_fence(seq_cst), but the actual definition is in terms of a more asm-based memory model of variables in memory having values that match what the abstract machine says they should.)

In terms of implementing the required ordering guarantees, a hand-written asm function is fully opaque to the optimizer, and has to be assumed to do anything like mutex->lock(), atomic seq_cst load, store, atomic_thread_fence(seq_cst), and read + write all globally-reachable objects.

See How does a mutex lock and unlock functions prevents CPU reordering? for why being non-inline is sufficient. If you defined pthread_mutex_lock in a header so it could inline, you'd write it with sufficient ordering baked in using __atomic / std::atomic, or GNU C inline asm statements, using low-level implementation-defined behaviour to achieve the necessary high-level ordering compatible with how this implementation treats atomics and memory ordering in general.

The internal implementation for any given target ISA using inline or stand-alone asm will creates synchronization on that ISA according to the hardware memory-order rules, perhaps using asm fence instructions or acquire loads / release stores, to be at least as strong as what ISO C++ requires. Perhaps stronger if the ISA doesn't have a way to do just acquire RMWs and release stores, like x86 where any atomic RMW has to be a full barrier (lock cmpxchg for example). Combined with the rules the implementation defines for asm, this can create high-level ordering.


Footnote 1: a missing definition isn't UB, it's an ill-formed program

http://eel.is/c++draft/intro.compliance#general-1

The set of diagnosable rules consists of all syntactic and semantic rules in this document except for those rules containing an explicit notation that “no diagnostic is required” or which are described as resulting in “undefined behavior”.

http://eel.is/c++draft/basic.lookup#general-1

Unless otherwise specified, the program is ill-formed if no declarations are found.

So a warning or error is required for a call to an undefined function. I guess if an implementation wanted to, it could "try" to run an ill-formed program anyway after warning about it, with no guarantees about behaviour? But then we're outside of C++ land. To silently break this code, an implementation would have to provide a definition of pthread_mutex_lock which didn't actually do locking.

(Or a pthread_mutex_lock that's incompatible with the cores that std::thread runs the program across? It's only guaranteed by POSIX to work for threads started by pthread_create, and I guess it's hypothetically possible to have a system where one thread-creation method runs threads across a set of cores that requires one flavour of asm instructions for correct locking, which doesn't work for threads created with a different interface.)

http://eel.is/c++draft/intro.defs#defns.undefined - definition of UB

behavior for which this document imposes no requirements

[Note 1: Undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message ([defns.diagnostic])), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. Evaluation of a constant expression ([expr.const]) never exhibits behavior explicitly specified as undefined in [intro] through [cpp]. — end note]

So it's not UB to use these functions on a system that doesn't have them at all, it's just an ill-formed program.

Also note how the definition explicitly grants implementations permission to define whatever behaviour they want to define, which ISO C++ leaves undefined, as in the next footnote. In a documented manner implies that this includes things suitable for production use on those implementations.

Footnote 2: defining more behaviour

For example, GCC language extensions such as typedef uint32_t aliasing_u32 __attribute__((aligned(1),may_alias)) let you deref an aliasing_u32* pointing at any object without regard for the normal aliasing rules, the same way you can with unsigned char. Also without regard for the normal alignment rules.

Another example is compiling with g++ -fwrapv. g++ -fwrapv is a C++ implementation that defines the behaviour of signed integer overflow (as 2's complement wraparound). Not just as the result you happen to get from UB, but truly well-defined so the optimizer has to respect it. And -fwrapv -fsanitize=undefined won't check for signed-integer overflow.

Such code would or could encounter undefined behaviour if compiled with an implementation that only defined the bare minimum that ISO C++ requires it to define, but not on the implementations they're written for.

ISO C++ overrule the ability of an implementation to define behaviour of various operations, including inline asm, separately-compiled asm, and language extensions like GNU C __atomic.

5
Peter On

It depends what meaning and importance you attach to the word "undefined".

A lot of discussions related to undefined behaviour in C++ programs get bogged down in the notion that implementations (compilers, libraries, host platforms, etc) actually attach some meaning to constructs that the C++ standard leaves undefined.

A C++ language pedant's view

Every C++ standard (whether draft or ratified) quite explicitly defines the meaning of the term "undefined behavior". For example, the C++20 draft N4849 specifies the meaning as "behavior for which this document imposes no requirements".

By this definition, any feature (extended language feature, additional library function, etc) provided by an implementation that is not covered by the C++ standard, has undefined behaviour.

This (narrow!) definition means that use of "platform-provided synchronous operations" introduces undefined behaviour.

This view can be quite practically important in some (not all) projects where portability (between compilers, platforms, etc) is a significant requirement. Avoiding undefined (and mitigating or managing reliance on unspecified or implementation-defined behaviours) is one way to reduce problems when porting code.

A slightly more liberal view based on other standards or specifications

A slightly more liberal view would allow for accepting some other standard (other than the C++ standard) or some authoritative specification that specifies those operations.

So, for example, if the operations are described somewhere in the POSIX family of standards, they are not undefined.

Or, if the operations are specified in a hardware reference manual, they are not undefined.

They key here is a decision to accept that compliance with the other standard or specification is appropriate (e.g. for purposes of the project at hand).

There is not some blanket rule that makes a standard or specification acceptable. For example, hardware-specific operations cannot be considered acceptable if there is a requirement for portability to another hardware platform that doesn't support those operations at all (or, worse, supports them but produces different behaviours on some platforms).

A pragmatic "I just want my program to work as I intend" view

In (one, of many) pragmatic views, the importance of behaviour being undefined (or not) comes down to needs of a project. The purpose of avoiding undefined behaviour is ensuring that a program works - regardless of which compiler is used, what compiler settings, what target platform, etc etc.

It is quite possible to be flexible in the face of a requirement that a program (or system) work correctly across multiple target platforms, when built with different compilers, and with minimal platform-specific code.

For example, it might be decided that the code should be translated (compiled, linked, etc) with a C++11 or later compiler, but also accept usage of (some subset of) POSIX-compliant functions. Such a decision is perfectly reasonable if (sometimes this is a big "if") acceptable quality POSIX-compliant libraries are available on all target platforms. Then the only things that remain undefined are those that are left undefined (either explicitly, or by omission) by both the C++ standard and the POSIX family of standards. For example, neither C++ nor POSIX standards specify any constraints on program behaviour if a program uses both std::thread and pthreads, so using both is a more risky acceptance of undefined behaviour (whatever your definition of "undefined").

Whether the decision is to allow use of "platform provided synchronous operations" or not, it is still advisable to verify that the program works as required. This means having suitable test cases to verify correctness, and running those test cases on a representative set of target platforms.

6
Nate Eldredge On

Echoing what others have said, in the context of ISO C++ alone, the program literally has undefined behavior, in the trivial sense that the ISO C++ standard does not define the behavior of pthread_mutex_lock. It doesn't even mention the word! So technically, a conforming ISO C++ compiler could provide a pthread_mutex_lock that was only well-defined with pthread_create threads, not std::thread. Or a Deathstation 9000 version that's intentionally UB or accidentally broken, reordering memory accesses around a call to pthread_mutex_lock(); but it could also cause the call to summon nasal demons. If there was no declaration at all, it couldn't do those things without at least warning about an ill-formed program first, and (most likely in real life) would refuse to compile it at all (e.g. "pthread_mutex_lock was not declared in this scope").

But while that statement is true, it is a bit silly. It's like saying that the word "perro" is undefined because it doesn't appear in an English dictionary. Well, of course not; it isn't an English word at all, but a Spanish word. You wouldn't normally see it in a sentence that's meant to be English, but rather in Spanish sentences. And so if you want a definition, you ought to be looking in a Spanish dictionary. Then you will see a perfectly reasonable definition, that it is a furry mammal with four legs and a tail that is often kept as a pet.

Likewise, you won't see pthread_mutex_lock in a program that is honestly meant to be pure ISO C++. It's not an ISO C++ function, but rather a POSIX one (that's what the p stands for after all), so we only expect to see it in POSIX programs, and so we ought to be looking at the POSIX spec. Or more precisely, IEEE Std 1003.1-2017. So a C++ implementation that provides a broken pthread_mutex_lock could be ISO C++ conforming but not POSIX.

Now there's a difficulty in reading the POSIX spec in conjunction with the ISO C or C++ standards, because, while the POSIX system interfaces (i.e. system calls) are described as a C API, the multithreading functions aren't defined using the C11/C++11 formal memory model. So you won't find an explicit statement like "pthread_mutex_unlock() synchronizes-with pthread_mutex_lock()", or that one critical section happens-before another, or anything like that.

What you do find is Section 4.12 of Base Definitions, which includes pthread_mutex_[un]lock in a list of functions which, in their words, "synchronize memory". They don't provide a formal definition of that term as far as I can find, so we have to interpret using common sense. It seems clear that the intention is to say that memory accesses are not reordered around these calls, and translating that into the C++ memory model would tell us that pthread_mutex_lock synchronizes with pthread_mutex_unlock, providing the appropriate happens-before ordering on the critical sections and avoiding a data race.

There's the other detail that POSIX speaks of threads only as things which are created with pthread_create and not with std::thread::thread(). So again, we have to use common sense to interpret. The ISO C++ std::thread is, in context, clearly tailor-made to be a wrapper around system-specific calls such as pthread_create, so it's entirely reasonable to assume that the kinds of threads they create have the same behavior.

So in the context of ISO C++ together with POSIX, together with reasonable interpretation, the behavior of this program is perfectly well defined.

As we mentioned above, ISO C++ allows for compilers which would reorder memory access around pthread_mutex_lock (causing a data race), or make it summon nasal demons, or not compile it at all. And this presumably is what you are worried about with your question. But what we see from this discussion is that such compilers cannot be provided as part of a conforming POSIX implementation. A C++ compiler that is usable on a POSIX system must allow pthread_mutex_lock to compile, and make it behave as the POSIX spec says, and not reorder memory accesses around it.

Such a compiler could in theory be implemented by special-casing pthread_mutex_lock and other function calls defined by POSIX. But that's impractical, so what's actually done (at least by reasonable C++ compilers that want to be usable on such platforms) is that they don't reorder memory access around any function call unless they know precisely what that call does, and can prove that such reordering won't violate any relevant standard. Typically, this means that reordering around a function call only happens in the following cases:

  1. The function's source code is visible to the compiler, and consists only of code that provably allows such reordering. In particular, it must contain no calls to functions which are not themselves visible in the same fashion.

  2. The function is one whose behavior is completely specified by ISO C++ or another relevant standard, and for which reordering would not break any rules.

Since posix_mutex_lock does not fit either case, the compiler won't reorder around it, and everything just works.


Saying that a given program has undefined behavior under ISO C++ is not inherently a bad thing. It is bad in the following contexts:

  1. If the program in question is intended to be portable across all ISO C++ implementations.

  2. If the program is intended to be portable across some subset of ISO C++ implementations, where some or all of them either do not define the behavior, define it inconsistently, or define it as doing something unwanted.

Neither of those apply here. A program containing pthread_mutex_lock cannot possibly be intended to be portable across all ISO C++ implementations, not by anyone with half a brain cell. It's more likely intended to be portable across POSIX C++ implementations, and as we've said, those implementations do consistently define its behavior as doing what we want.