Tell C++ that pointer data is 16 byte aligned

3.5k views Asked by At

I wrote some code with static arrays and it vectorizes just fine.

float data[1024] __attribute__((aligned(16)));

I would like to make the arrays dynamically allocated. I tried doing something like this:

float *data = (float*) aligned_alloc(16, size*sizeof(float));

But the compiler (GCC 4.9.2), no longer can vectorize the code. I assume this is because it doesn't know the pointer data is 16 byte aligned. I am getting messages like:

note: Unknown alignment for access: *_43

I have tried adding this line before the data is used, but it doesn't seem to do anything:

data = (float*) __builtin_assume_aligned(data, 16);

Using a different variable and restrict did not help:

float* __restrict__ align_data = (float*) __builtin_assume_aligned(data,16);

Example:

#include <iostream>
#include <stdlib.h>
#include <math.h>

#define SIZE 1024
#define DYNAMIC 0
#define A16 __attribute__((aligned(16)))
#define DA16 (float*) aligned_alloc(16, size*sizeof(float))

class Test{
public:
    int size;
#if DYNAMIC
    float *pos;
    float *vel;
    float *alpha;
    float *k_inv;
    float *osc_sin;
    float *osc_cos;
    float *dosc1;
    float *dosc2;
#else
    float pos[SIZE] A16;
    float vel[SIZE] A16;
    float alpha[SIZE] A16;
    float k_inv[SIZE] A16;
    float osc_sin[SIZE] A16;
    float osc_cos[SIZE] A16;
    float dosc1[SIZE] A16;
    float dosc2[SIZE] A16;
#endif
    Test(int arr_size){
        size = arr_size;
#if DYNAMIC
        pos = DA16;
        vel = DA16;
        alpha = DA16;
        k_inv = DA16;
        osc_sin = DA16;
        osc_cos = DA16;
        dosc1 = DA16;
        dosc2 = DA16;
#endif
    }
    void compute(){
        for (int i=0; i<size; i++){
            float lambda = .67891*k_inv[i],
                omega = (.89 - 2*alpha[i]*lambda)*k_inv[i],
                diff2 = pos[i] - omega,
                diff1 = vel[i] - lambda + alpha[i]*diff2;
            pos[i] = osc_sin[i]*diff1 + osc_cos[i]*diff2 + lambda*.008 + omega;
            vel[i] = dosc1[i]*diff1 - dosc2[i]*diff2 + lambda;
        }
    }
};

int main(int argc, char** argv){
    Test t(SIZE);
    t.compute();
    std::cout << t.pos[10] << std::endl;
    std::cout << t.vel[10] << std::endl;
}

Here is how I am compiling:

g++ -o test test.cpp -O3 -march=native -ffast-math -fopt-info-optimized

When DYNAMIC is set to 0, it outputs:

test.cpp:46:4: note: loop vectorized

but when it is set to 1 it outputs nothing.

1

There are 1 answers

7
Ross Ridge On BEST ANSWER

The compiler isn't vectorizing the loop because it can't determine that the dynamically allocated pointers don't alias each other. A simple way to allow your sample code to be vectorized is to pass the --param vect-max-version-for-alias-checks=1000 option. This will allow the compiler to emit all the checks necessary to see if the pointers are actually aliased.

Another simple solution to allow your you example code to be vectorized is to rename main, as suggested by Marc Glisse in his comment. Functions named main apparently have certain optimizations disabled. Named something else, GCC 4.9.2 can track the use of this->foo (and the other pointer members) in compute back to their allocations in Test().

However, I assume something other than your class being used in a function named main prevented your code being vectorized in your real code. A more general solution that allows your code to vectorized without aliasing or alignment checks is to use the restrict keyword and the aligned attribute. Something like this:

typedef float __attribute__((aligned(16))) float_a16;

__attribute__((noinline))
static void _compute(float_a16 * __restrict__ pos,
         float_a16 * __restrict__ vel,
         float_a16 * __restrict__ alpha,
         float_a16 * __restrict__ k_inv,
         float_a16 * __restrict__ osc_sin,
         float_a16 * __restrict__ osc_cos,
         float_a16 * __restrict__ dosc1,
         float_a16 * __restrict__ dosc2,
         int size) {
    for (int i=0; i<size; i++){
        float lambda = .67891*k_inv[i],
            omega = (.89 - 2*alpha[i]*lambda)*k_inv[i],
            diff2 = pos[i] - omega,
            diff1 = vel[i] - lambda + alpha[i]*diff2;
        pos[i] = osc_sin[i]*diff1 + osc_cos[i]*diff2 + lambda*.008 + omega;
        vel[i] = dosc1[i]*diff1 - dosc2[i]*diff2 + lambda;
    }
}

void compute() {
    _compute(pos, vel, alpha, k_inv, osc_sin, osc_cos, dosc1, dosc2,
         size);
}

The noinline attribute is critical, otherwise inlining can cause the pointers to lose their restrictedness and alignedness. The compiler seems to ignore the restrict keyword in contexts other than function parameters.