Read / write partially allocated aligned memory

315 views Asked by At

There are a lot of questions about accessing unallocated memory, which is clearly undefined behavior. But what about the following corner case.

Consider the following struct, which is aligned to 16 bytes, but occupies only 8 bytes from that:

struct alignas(16) A
{
    float data[2]; // the remaining 8 bytes are unallocated
};

Now we access 16 bytes of data by SSE aligned load / store intrinsics:

__m128 test_load(const A &a)
{
    return _mm_load_ps(a.data);
}

void test_store(A &a, __m128 v)
{
    _mm_store_ps(a.data, v);
}

Is this also undefined behavior and should I use padding instead?

Anyway, since Intel intrinsics are not standard C++, is accessing a partly allocated but aligned memory block (not greater than the size of the alignment) undefined behavior in standard C++?

I address both the intrinsic case and standard C++ case. I'm interested in both of them.

2

There are 2 answers

0
Peter Cordes On BEST ANSWER

See also Is it safe to read past the end of a buffer within the same page on x86 and x64? The reading part of this question is basically a duplicate of that.

It's UB according to the ISO C++ standard, but I think read-only access like this does work safely (i.e. compile to the asm that you'd expect) on implementations that provide Intel's intrinsics (which are free to define whatever extra behaviour they want). It's definitely safe in asm, but the risk is that optimizing C++ compilers that turn UB into mis-compiled code might cause a problem if they can prove that there's nothing there to read. There's some discussion of that on the linked question.


Writing outside of objects is always bad. Don't do it, not even if you put back the same garbage you read earlier: A non-atomic load/store pair can be a problem depending on what data follows your struct.

The only time this is ok is in an array where you know what comes next, and that there is unused padding. e.g. writing out an array of 3-float structs using 16B stores overlapping by 4B. (Without alignas for over-alignment, so an array packs them together without padding).


A struct of 3 floats would be a much better example than 2 floats.

For this specific example (of 2 floats) you can just use MOVSD to do a 64-bit zero-extending load, and MOVSD or MOVLPS to do a 64-bit store of the low half of an __m128.

6
SergeyA On

A language-lawyer answer to this is 'the question is moot'. _mm_load_ps is not defined in standard, and it is using ASM instruction which is not defined in standard either. C++ does not deal with this.

As for your second question - accessing an unallocated memory from C++ this way is clearly undefined behavior. No object was placed in this memory, thus you can't access it.