Reinterpret struct with members of the same type as an array in a standard compliant way

1.3k views Asked by At

In various 3d math codebases I sometimes encounter something like this:

struct vec {
    float x, y, z;

    float& operator[](std::size_t i)
    {
        assert(i < 3);
        return (&x)[i];
    }
};

Which, AFAIK is illegal because implementations are allowed to spuriously add padding between members, even if they are of the same type, though none will do so in practice.

Can this be made legal by imposing constraints via static_asserts?

static_assert(sizeof(vec) == sizeof(float) * 3);

I.e. does static_assert not being triggered implies operator[] does what is expected and doesn't invoke UB at runtime?

5

There are 5 answers

9
Brian Bi On BEST ANSWER

No, it is not legal because when adding an integer to a pointer, the following applies ([expr.add]/5):

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

y occupies the memory location one past the end of x (considered as an array with one element) so adding 1 to &x is defined, but adding 2 to &x is undefined.

0
Malcolm McLean On

Type aliasing (use of more then one type for essentially the same data) is a huge problem in C++. If you keep member functions out of structs and maintain them as PODs, things ought to work. But

  static_assert(sizeof(vec) == sizeof(float) * 3);

can't make accessing one type as another technically legal. In practice of course there will be no padding, but C++ isn't clever enough to realise that vec is an array of floats and an array of vecs is an array of floats constrained to be a multiple of three, and the casting &vecasarray[0] to a vec * is legal but casting &vecasarray[1] is illegal.

1
Serge Ballesta On

According to the standard, it is clearly Undefined Behaviour, because you either do pointer arithmetics outside of an array or alias the content of a struct and an array.

The problem is that math3D code can be used intensively, and low level optimization makes sense. The C++ conformant way would be to directly store the array, and use accessors or references to individual members of the array. And neither of those 2 options are perfectly fine:

  • accessors:

    struct vec {
    private:
        float arr[3];
    public:
        float& operator[](std::size_t i)
        {
            assert(i < 3);
            return arr[i];
        }
        float& x() & { return arr[0];}
        float& y() & { return arr[1];}
        float& z() & { return arr[2];}
    };
    

    The problem is that using a function as a lvalue is not natural for old C programmers: v.x() = 1.0; is indeed correct but I'd rather avoid a library that would force me to write that. Of course we could use setters, but if possible, I prefere to write v.x = 1.0; than v.setx(1.0);, because of the common idiom v.x = v.z = 1.0; v.y = 2.0;. It is only my opinion, but I find it neater than v.x() = v.z() = 1.0; v.y() = 2.0; or v.setx(v.sety(1.0))); v.setz(2.0);.

  • references

    struct vec {
    private:
        float arr[3];
    public:
        float& operator[](std::size_t i)
        {
            assert(i < 3);
            return arr[i];
        }
        float& x;
        float& y;
        float& z;
        vec(): x(arr[0]), y(arr[1]), z(arr[2]) {}
    };
    

    Nice! We can write v.x and v[0], both representing the same memory... unfortunately, the compilers are still not smart enough to see that the refs are just aliases for an in struct array and the size of the struct is twice the size of the array!

For those reasons, the incorrect aliasing is still commonly used...

3
Edy On

How about storing the data member as array and access them by names?

struct vec {
    float p[3];

    float& x() { return p[0]; }
    float& y() { return p[1]; }
    float& z() { return p[2]; }

    float& operator[](std::size_t i)
    {
        assert(i < 3);
        return p[i];
    }
};

EDIT: For the original approach, if x, y and z are all the member variables you have, then the struct will always be the size of 3 floats, so static_assert can be used for checking that operator[] will access within bounded size.

See also: C++ struct member memory allocation

EDIT 2: Like Brian said in another answer, (&x)[i] itself is undefined behaviors in the standard. However, given that the 3 floats are the only data members, the code in this context should be safe.

To be pedantic on syntax correctness:

struct vec {
  float x, y, z;
  float* const p = &x;

  float& operator[](std::size_t i) {
    assert(i < 3);
    return p[i];
  }
};

Although this will increase each vec by the size of a pointer.

2
Christophe On

You can never be sure that this will work

There is no guarantee of contiguity of subsequent members, even if this will frequently work perfectly in practice thanks to usual float alignment properties and permissive pointer arithmetic.

This is laid down in the following clause of the C++ standard:

[class.mem]/18: Non-static data-members (...) with the same access control are allocated so that later members have higher addresses within the class object. Implementation alignment requirements might cause two adjacent members not to be allocated after each other.

There is no way to make this legal using static_assert nor alignas constraints. All you can do is to prevent the compilation, when the elements are not contiguous, using the property that the address of each object is unique:

    static_assert (&y==&x+1 && &z==&y+1, "PADDING in vector"); 

But you can reimplement the operator to make it standard compliant

A safe alternative, would be to reimplement operator[] to get rid of the contiguity requirement for the three members:

struct vec {
    float x,y,z; 

    float& operator[](size_t i)
    {
        assert(i<3); 
        if (i==0)     // optimizing compiler will make this as efficient as your original code
            return x; 
        else if (i==1) 
            return y; 
        else return z;
    }
};

Note that an optimizing compiler will generate very similar code for both the reimplementation and for your original version (see an example here). So rather choose the compliant version.