I wrote some code with static arrays and it vectorizes just fine.
float data[1024] __attribute__((aligned(16)));
I would like to make the arrays dynamically allocated. I tried doing something like this:
float *data = (float*) aligned_alloc(16, size*sizeof(float));
But the compiler (GCC 4.9.2), no longer can vectorize the code. I assume this is because it doesn't know the pointer data is 16 byte aligned. I am getting messages like:
note: Unknown alignment for access: *_43
I have tried adding this line before the data is used, but it doesn't seem to do anything:
data = (float*) __builtin_assume_aligned(data, 16);
Using a different variable and restrict
did not help:
float* __restrict__ align_data = (float*) __builtin_assume_aligned(data,16);
Example:
#include <iostream>
#include <stdlib.h>
#include <math.h>
#define SIZE 1024
#define DYNAMIC 0
#define A16 __attribute__((aligned(16)))
#define DA16 (float*) aligned_alloc(16, size*sizeof(float))
class Test{
public:
int size;
#if DYNAMIC
float *pos;
float *vel;
float *alpha;
float *k_inv;
float *osc_sin;
float *osc_cos;
float *dosc1;
float *dosc2;
#else
float pos[SIZE] A16;
float vel[SIZE] A16;
float alpha[SIZE] A16;
float k_inv[SIZE] A16;
float osc_sin[SIZE] A16;
float osc_cos[SIZE] A16;
float dosc1[SIZE] A16;
float dosc2[SIZE] A16;
#endif
Test(int arr_size){
size = arr_size;
#if DYNAMIC
pos = DA16;
vel = DA16;
alpha = DA16;
k_inv = DA16;
osc_sin = DA16;
osc_cos = DA16;
dosc1 = DA16;
dosc2 = DA16;
#endif
}
void compute(){
for (int i=0; i<size; i++){
float lambda = .67891*k_inv[i],
omega = (.89 - 2*alpha[i]*lambda)*k_inv[i],
diff2 = pos[i] - omega,
diff1 = vel[i] - lambda + alpha[i]*diff2;
pos[i] = osc_sin[i]*diff1 + osc_cos[i]*diff2 + lambda*.008 + omega;
vel[i] = dosc1[i]*diff1 - dosc2[i]*diff2 + lambda;
}
}
};
int main(int argc, char** argv){
Test t(SIZE);
t.compute();
std::cout << t.pos[10] << std::endl;
std::cout << t.vel[10] << std::endl;
}
Here is how I am compiling:
g++ -o test test.cpp -O3 -march=native -ffast-math -fopt-info-optimized
When DYNAMIC
is set to 0
, it outputs:
test.cpp:46:4: note: loop vectorized
but when it is set to 1
it outputs nothing.
The compiler isn't vectorizing the loop because it can't determine that the dynamically allocated pointers don't alias each other. A simple way to allow your sample code to be vectorized is to pass the
--param vect-max-version-for-alias-checks=1000
option. This will allow the compiler to emit all the checks necessary to see if the pointers are actually aliased.Another simple solution to allow your you example code to be vectorized is to rename
main
, as suggested by Marc Glisse in his comment. Functions namedmain
apparently have certain optimizations disabled. Named something else, GCC 4.9.2 can track the use ofthis->foo
(and the other pointer members) incompute
back to their allocations inTest()
.However, I assume something other than your class being used in a function named
main
prevented your code being vectorized in your real code. A more general solution that allows your code to vectorized without aliasing or alignment checks is to use therestrict
keyword and thealigned
attribute. Something like this:The
noinline
attribute is critical, otherwise inlining can cause the pointers to lose their restrictedness and alignedness. The compiler seems to ignore therestrict
keyword in contexts other than function parameters.