Unspecified behaviour about "object having more than one object representation"

129 views Asked by At

Still struggling with C (C99) undefined and unspecified behaviours.

This time it is the following Unspecified Behaviour (Annex J.1):

The representation used when storing a value in an object that has more than one object representation for that value (6.2.6.1).

The corresponding section 6.2.6.1 states:

Where an operator is applied to a value that has more than one object representation, which object representation is used shall not affect the value of the result43). Where a value is stored in an object using a type that has more than one object representation for that value, it is unspecified which representation is used, but a trap representation shall not be generated.

with the following note 43:

It is possible for objects x and y with the same effective type T to have the same value when they are accessed as objects of type T, but to have different values in other contexts. In particular, if == is defined for type T, then x == y does not imply that memcmp(&x, &y, sizeof(T)) == 0. Furthermore, x == y does not necessarily imply that x and y have the same value; other operations on values of type T may distinguish between them.

I don't even understand what would be a value that has more than one object representation. Is it related for example to a floating point representation of 0 (negative and positive zero) ?

3

There are 3 answers

3
zwol On BEST ANSWER

Most of this language is the C standard going well out of its way to allow for continued use on Burroughs B-series mainframes (AFAICT the only surviving ones-complement architecture). Unless you have to work with those, or certain uncommon microcontrollers, or you're seriously into retrocomputing, you can safely assume that the integer types have only one object representation per value, and that they have no padding bits. You can also safely assume that all integer types have no trap representations, except that you must take this line of J.2

[the behavior is undefined if ...] the value of an object with automatic storage duration is used while it is indeterminate

as if it were normative and as if the crossed-out words were not present. (This rule is not supported by a close reading of the actual normative text, but it is nonetheless the rule adopted by all of the current generation of optimizing compilers.)

Concrete examples of types that can have more than one object representation for a value on a modern, non-exotic implementation include:

  • _Bool: the effect of overwriting a _Bool object with the representation of an integer value other than an appropriately sized 0 or 1 is unspecified.

  • pointer types: some architectures ignore the low bits of a pointer to a type whose minimum alignment is greater than 1 (e.g. (int*)0x8000_0000 and (int*)0x8000_0001 might be treated as referring to the same int object; this is an intentional hardware feature, facilitating the use of tagged pointers)

  • floating point types: IEC 60559 allows all of the many representations of NaN to be treated identically (and possibly squashed together) by the hardware. (Note: +0 and −0 are distinct values in IEEE floating point, not different representations of the same value.)

These are also the scalar types that can have trap representations in modern implementations. In particular, Annex F specifically declares the behavior of signaling NaN to be undefined, even though it's well-defined in an abstract implementation of IEC 60559.

5
chqrlie On

As you suspected, -0.0 is a good candidate but only for the last phrase:

Furthermore, x == y does not necessarily imply that x and y have the same value; other operations on values of type T may distinguish between them.

double x = 0.0;
double y = -0.0;
if (x == y) {
    printf("x and y have the same value\n");
}
if (memcmp(&x, &y, sizeof(double)) {
    printf("x and y have a different representation\n");
}
if (1 / x != 1 / y) {
    printf("1/x and 1/y have a different value\n");
}

Another example of a value with more than one possible representation is NaN. 0.0 / 0.0 evaluates to a NaN value, which may have a different representation from the one produced by the macro NAN or another operation producing NaN or even the same expression 0.0 / 0.0 evaluated again. memcmp() may show that the representations differ. This example however does not really illustrate the purpose of the Standard's quote in the question as these values do not match per the == operator.

The text you quoted from the Annex J seems to specifically address some rare architectures (nowadays) that have padding bits and/or representations of negative numbers with 2 different representations for 0. All modern systems use two's complement to represent negative numbers, where all bit patterns represent different values, but 4 decades ago you some fairly common mainframes used ones' complement or sign and magnitude where 2 different bit patterns could represent the value 0.

0
John Bollinger On

I don't even understand what would be a value that has more than one object representation. Is it related for example to a floating point representation of 0 (negative and positive zero) ?

No, negative and positive zero are different values.

In practice, you probably don't need to worry about values with different object representations, but one possible example would involve integer types that include padding bits. For example, suppose your implementation provided a 15-(value-)bit unsigned integer type, whose storage size was 16 bits. Suppose also that the padding bit in the representation of that type were completely ignored for the purpose of evaluating objects (that is, that the type afforded no trap representations). Then each value representable by that type would have two distinct object representations, differing in the value of the padding bit.

The standard says that in such a case, you cannot rely on a particular choice between those value representations to be made under any given circumstances, but also that it doesn't matter when such objects are operands of any C operator. Note 43 clarifies that the difference may nevertheless be felt in other ways.