Assuming unsigned int
has no trap representations, do either or both of the statements marked (A) and (B) below provoke undefined behavior, why or why not, and (especially if you think one of them is well-defined but the other isn't), do you consider that a defect in the standard? I am primarily interested in the current version of the C standard (i.e. C2011), but if this is different in older versions of the standard, or in C++, I would also like to know about that.
(_Alignas
is used in this program to eliminate any question of UB due to inadequate alignment. The rules I discuss in my interpretation, though, say nothing about alignment.)
#include <stdlib.h>
#include <string.h>
int main(void)
{
unsigned int v1, v2;
unsigned char _Alignas(unsigned int) b1[sizeof(unsigned int)];
unsigned char *b2 = malloc(sizeof(unsigned int));
if (!b2) return 1;
memset(b1, 0x55, sizeof(unsigned int));
memset(b2, 0x55, sizeof(unsigned int));
v1 = *(unsigned int *)b1; /* (A) */
v2 = *(unsigned int *)b2; /* (B) */
return !(v1 == v2);
}
My interpretation of C2011 is that (A) provokes undefined behavior but (B) is well-defined (to store an unspecified value into v2
), because:
memset
is defined (§7.24.6.1) to write to its first argument as-if through an lvalue with character type, which is allowed for bothb1
andb2
per the special case at the bottom of §6.5p7.The object
b1
has a declared type,unsigned char[n]
. Therefore, its effective type for accesses is alsounsigned char[n]
per 6.5p6. Statement (A) readsb1
via an lvalue expression whose type isunsigned int
, which is not the effective type ofb1
nor any of the other exceptions in 6.5p7, so the behavior is undefined.The object pointed-to by
b2
has no declared type. The value stored into it (bymemset
) was (as-if) through an lvalue with character type, so the second case of 6.5p6 does not apply. The value was not copied from anywhere, so the third case of 6.5p6 does not apply either. Therefore, the effective type of the object is the type of the lvalue used for the access, which isunsigned int
, and the rules of 6.5p7 are satisfied.Finally, per 6.2.6.1, assuming
unsigned int
has no trap representations, thememset
operation has created the representation of some unspecifiedunsigned int
value in each ofb1
andb2
. Therefore, if neither (A) nor (B) provokes undefined behavior, then the actual values inv1
andv2
are unspecified but they are equal.
Commentary:
The asymmetry of the "type-based aliasing" rules (that is, 6.5p7), permitting an object with any effective type to be accessed by an lvalue with character type, but not vice versa, is a continual source of confusion. The second case of 6.5p6 seems to have been added specifically to prevent its being undefined behavior to read a value initialized by memset
(or, for that matter, calloc
) but, because it only applies to objects with no declared type, is itself an additional source of confusion.
On a superficial examination, I'd agree with your assessment (A is UB, B is fine), and can offer a concrete rationale for why that should be so (prior to the edit to include
_Alignas()
): Alignment.The
char[]
on the stack can start at any address, whether that's a valid alignment for anunsigned int
or not. In contrast,malloc()
is required to return memory meeting the strictest alignment requirements of any native type on the platform in question.The standard obviously doesn't want to impose alignment requirements on
char[]
beyond those ofchar
, so it has to leave type-punned access to it as potentially undefined.