There are a lot of questions about accessing unallocated memory, which is clearly undefined behavior. But what about the following corner case.
Consider the following struct, which is aligned to 16 bytes, but occupies only 8 bytes from that:
struct alignas(16) A
{
float data[2]; // the remaining 8 bytes are unallocated
};
Now we access 16 bytes of data by SSE aligned load / store intrinsics:
__m128 test_load(const A &a)
{
return _mm_load_ps(a.data);
}
void test_store(A &a, __m128 v)
{
_mm_store_ps(a.data, v);
}
Is this also undefined behavior and should I use padding instead?
Anyway, since Intel intrinsics are not standard C++, is accessing a partly allocated but aligned memory block (not greater than the size of the alignment) undefined behavior in standard C++?
I address both the intrinsic case and standard C++ case. I'm interested in both of them.
See also Is it safe to read past the end of a buffer within the same page on x86 and x64? The reading part of this question is basically a duplicate of that.
It's UB according to the ISO C++ standard, but I think read-only access like this does work safely (i.e. compile to the asm that you'd expect) on implementations that provide Intel's intrinsics (which are free to define whatever extra behaviour they want). It's definitely safe in asm, but the risk is that optimizing C++ compilers that turn UB into mis-compiled code might cause a problem if they can prove that there's nothing there to read. There's some discussion of that on the linked question.
Writing outside of objects is always bad. Don't do it, not even if you put back the same garbage you read earlier: A non-atomic load/store pair can be a problem depending on what data follows your struct.
The only time this is ok is in an array where you know what comes next, and that there is unused padding. e.g. writing out an array of 3-
float
structs using 16B stores overlapping by 4B. (Withoutalignas
for over-alignment, so an array packs them together without padding).A struct of 3
float
s would be a much better example than 2floats
.For this specific example (of 2 floats) you can just use MOVSD to do a 64-bit zero-extending load, and MOVSD or MOVLPS to do a 64-bit store of the low half of an
__m128
.