Type-punning or reinterpreting the underlying bits from one type to another is notorious for having unpredictable and/or non-portable behavior.
For example:
union {
unsigned u;
float f;
} c = {.u = 10};
float f = c.f;
Not portable, that will depend on the representation of float.
union {
unsigned char c[2];
unsigned short s;
} c = {.c = {1, 2}};
short s = c.s;
Not portable, that will depend on the value of CHAR_BIT and the byte-order/endian of the system.
However, will any of the following have Standard-guaranteed/portable behavior, provided all the <stdint.h> types are defined:
union {
uint8_t b[2];
uint16_t w;
} c = {.b = {0x18, 0x18}};
assert(c.w == 0x1818);
Or the contrary:
union {
uint8_t b[2];
uint16_t w;
} c = {.w = 0x1818};
assert(c.b[0] == 0x18 && c.b[1] == 0x18);
Or if I extend the size of the types:
union {
uint16_t w[2];
uint32_t l;
} c = {.w = {0x1818, 0x1818}};
assert(c.l == 0x18181818);
In the above examples, the byte-order does not matter because the number is 'cyclic' and has the same representation in big/little-endian, or in any other esoteric byte-order for that matter. The types are guaranteed to be exactly their specified bits wide and have no trap representations or padding bits.
For those reasons there is no logical reason for the type-pun to have non-portable behavior or return any value other than those specified in the assert(), but does the C Standard make the same guarantee explicitly? Are those examples truly portable?
The C Standard states that reading an inactive union member will 'reinterpret' the bits to the new type but does that translate to the above examples having portable behavior? Or is there some way by some oddity some technically-conforming C99 implementation could compile but not produce the expected results?
Type-punning refers to reinterpreting a representation of a type as another type. If types are guaranteed to have the same or sufficiently well-defined representations, then type-punning may be portable.
This is confirmed explicitly in 6.2.5 Footnote 39 (Emphasis mine):
Integers
This means that any positive
unsignedvalue of a type less than or equal to the maximum positive value of the correspondingsignedtype will have the same value when type-punned, and vice versa, since all the corresponding bits in the representation must have the same effect on the final value.This is guaranteed explicitly:
Additionally:
Any value with all bits 0 has a value of 0. Therefore, any part of any all-0 integer type can be type-punned to any smaller integer, or multiple smaller integers with all bits 0 (including padding bits if any) may be type-punned to a larger one, and the value will still be 0.
This partially addresses the examples in the question, as we know the fixed-width types have no padding bits, so those examples shall work with values of 0.
Fixed-width integers (if defined)
Type-punning
intN_ttouintN_tfor the sameNwill be equivalent to adding2^(N-1)to the value if theintN_tvalue is negative. The reverse will be equivalent to subtracting2^(N-1)from the value if theuintN_tvalue is greater than the maximum value ofintN_t.This requirement guarantees that there are no padding bits and, since they have the same total number of bits, the number of value bits in the
intN_tmust be one less than the number of value bits in theuintN_t.And since all 15 value bits in the
intN_tmust have the same values as the corresponding bits in the representation ofuintN_t, and that two's complement is required for all fixed width types, by process of elimination the sign bit inintN_tmust correspond to the value bit with value2^N-1in theuintN_t. Thus, type-punning between them must have portable behavior as specified above.Pointers
In 6.2.5:
This implies one can safely type-pun between
void *andchar *, or between any twostructpointers, or any twounionpointers, or between any two pointers to compatible (e.g., signed and unsigned versions of the same type) types. Although one can convert any object pointer type tovoid *orchar *, doing so would require an explicit cast, not a type-pun.Structures
Type-punning between structures and other structures or types is generally non-portable, due to the unspecified amount of padding inserted between structure members. However there are some exceptions:
In 6.5.2.3:
Type-punning is portable between a structure and the first member of the structure, or between a union and any of the members in the union, provided the behavior of type-punning the member with the last-stored member of the union has portable behavior.
Additionally:
This means if you have several separate structure types but all of their first members are of compatible types and in the same order, their matching members may be type-punned/accessed so long as at the scope of accessing, a
unionis fully declared and visible both. Example from the C Standard:Union between arrays of smaller fixed-width types and larger fixed-width types
This means that, provided there are no padding bits (which is the case for fixed-width types), type punning between two consecutive types to one twice as big will be guaranteed to have the effect of concatenating the bits of their object representations. Contiguous implies there can be no 'junk' between raw bytes in memory.
But does pure binary notation guarantee that the value bits in the object representation are ordered, increasingly, by magnitude?
Pure binary notation is defined as:
This explicitly mentions the representation, successive bits, and position. This implies that the bits in pure binary notation are ordered starting lowest to highest. If this were not the case, and the exact position of the bit within the representational would be meaningless, and the definition would not mention the position or that the bits are successive, that each of the value bits exist and correspond to each power of 2 between 0 and N. However, this definition specifies that the bits are successive and ordered.
Why the requirement that bits signed integers have the same values as corresponding bits in unsigned values, if that would be redundant? Most likely, to make sure that the placement of the padding and/or sign bit does not 'offset' the value bits relative to a corresponding signed type.
Given the above, concatenating identical copies of a fixed number of ordered bits into a new fixed number of ordered bits must produce the same value each time. A case could be made that any implementation that does not demonstrate the expected behavior in that case would violate the definition of pure binary notation.