Is it undefined behavior to use functions with side effects in an unspecified order?

955 views Asked by At

I know that things like x = x++ + ++x invokes undefined behavior because a variable is modified multiple times within the same sequence point. That's thoroughly explained in this post Why are these constructs using pre and post-increment undefined behavior?

But consider a thing like printf("foo") + printf("bar"). The function printf returns an int, so the expression is valid in that sense. But the order of evaluation for the + operator is not specified in the standard, so it is not clear if this will print foobar or barfoo.

But my question here is if this also is undefined behavior.

4

There are 4 answers

7
Eric Postpischil On BEST ANSWER

printf("foo") + printf("bar") does not have undefined behavior (except for the caveat noted below) because the function calls are indeterminately sequenced and are not unsequenced.

C effectively has three possibilities for sequencing:

  • Two things, A and B, may be sequenced in a particular order, one of A before B or B before A.
  • Two things may be indeterminately sequenced, so that A is sequenced before B or vice-versa, but it is unspecified which.
  • Two things are unsequenced.

To distinguish between the latter two, suppose writing to stdout requires putting bytes in a buffer and updating the counter of how many bytes are in the buffer. (For this, we will neglect what happens when the buffer is full or should be sent to the output device.) Consider two writes to stdout, called A and B.

If A and B are indeterminately sequenced, then either one can go first, but both of its parts—writing the bytes and updating the counter—must be completed before the other one starts. If A and B are unsequenced, then nothing controls the parts; we might have: A puts its bytes in the buffer, B puts its bytes in the buffer, A updates the counter, B updates the counter.

In the former case, both writes are completed, but they can be completed in either order. In the latter case, the behavior is undefined. One of the possibilities is that B writes its bytes in the same place in the buffer as A’s bytes, losing A's bytes, because the counter was not updated to tell B where its new bytes should go.

In printf("foo") + printf("bar"), the writes to stdout are indeterminately sequenced. This is because the function calls provide sequence points that separate the side effects, but we do not know in which order they are evaluated.

C 2018 6.5.2.2 10 tells us that function calls introduce sequence points:

There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call. Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function.

Thus, if the C implementation happens to evaluate printf("foo") second, there is a sequence point just before the actual call, and the evaluation of printf("bar") must have been sequenced before this. Conversely, if the implementation evaluates printf("bar") first, then printf("foo") must have been sequenced before it. So, there is sequencing, albeit indeterminate.

Additionally, 7.1.4 3 tells us:

There is a sequence point immediately before a library function returns.

Therefore, the two function calls are indeterminately sequenced. The rule in 6.5 2 about unsequenced side effects does not apply:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined…

(Not to mention the fact that stdout is not a scalar object.)

Caveat

There is a hazard that the C standard permits standard library functions to be implemented as function-like macros (C 2018 7.1.4 1). In this case, the reasoning above about sequence points might not apply. A program can force function calls by enclosing the name in parentheses so that it will not be treated as an invocation of a function-like macro: (printf)("foo") + (printf)("bar").

0
supercat On

As noted elsewhere, if two function calls are used in an expression, a compiler may choose in Unspecified fashion which one will be invoked first, but all parts of one operation (chosen in Unspecified fashion) must precede all parts of the other. By contrast, if two operations are unsequenced, a compiler may interleave the parts of the operation.

A point I haven't seen mentioned, however, is that while many compilers are designed to process certain operations on primitive types in such a way that distinctions between "unsequenced" and "indeterminately sequenced" don't matter, some optimizers may produce machine code where such things could matter, especially in multi-threaded scenarios, so it's good to be concerned about such distinctions.

Consider a function like the following, if processed by gcc 9.2.1 with options -xc -O3 -mcpu=cortex-m0 [the Cortex-M0 is a popular current-production 32-bit core found in low-end microcontrollers]:

#include <stdint.h>
uint16_t incIfUnder32768(uint16_t *p)
{
    uint16_t temp = *p;
    return temp - (temp >> 15) + 1;
}

One might expect that if another thread were to change *p during the function, it would either perform the computation based upon the value of *p before the change, or perform the computation based upon the value after. The optimizer for gcc 9.2.1, however, will generate machine code as though the source code were written:

#include <stdint.h>
uint16_t incIfUnder32768(uint16_t *p)
{
    return *p - (*p >> 15) + 1;
}

If the value of *p were to e.g. change from 0xFFFF to 0, or 0 to 0xFFFF, the function might return 0xFFFF even though there would be no value *p could have held that would yield that result.

Although compilers when the Standard was written would almost invariably extend the semantics of the language by processing many actions "in a documented fashion characteristic of the environment" regardless of whether the Standard would require them to do so, some "clever" compiler writers seek to exploit opportunities where deviating from such behaviors would allow "optimizations" that might or might not actually make code more efficient.

6
0___________ On

No it is not.

It is Unspecified Behaviour

enter image description here

0
Asteroids With Wings On

You're probably asking because a program that tries to read an unspecified value (e.g. uninitialised int) has undefined behaviour.

That is not the case with unspecified order or indeterminately sequenced operations. You don't know what you'll get, but the program has well-defined behaviour.

The writing to stdout doesn't cause a problem because the value is not "unspecified" in that sense either. You can think of it more as an implementation-defined value, as a result of the unspecified ordering.

tl;dr: not everything "unspecified" leads to being "undefined".