I'm wondering whether there is any compiler (gcc
, xlc
, etc.) on Power8 that supports OpenMP SIMD constructs on Power8? I tried with XL (13.1) but I couldn't compile successfully. Probably it doesn't support simd construct yet.
I could compile with gcc 4.9.1 (with these flags -fopenmp -fopenmp-simd
and -O1
). I put differences between 2 asm files.
Can I say that gcc 4.9 is able to generate altivec code? In order to optimize more, what am I supposed to do? (I tried with -O3
, restrict treatment)
My code is very simple:
int *x, *y, *z;
x = (int*) malloc(n * sizeof(int));
y = (int*) malloc(n * sizeof(int));
z = (int*) malloc(n * sizeof(int));
#pragma omp simd
for(i = 0; i < N; ++i)
z[i] = a * x[i] + y[i];
And generated assembly is here
.L7:
lwz 9,124(31)
extsw 9,9
std 9,104(31)
lfd 0,104(31)
stfd 0,104(31)
ld 8,104(31)
sldi 9,8,2
ld 10,152(31)
add 9,10,9
lwz 10,124(31)
extsw 10,10
std 10,104(31)
lfd 0,104(31)
stfd 0,104(31)
ld 7,104(31)
sldi 10,7,2
ld 8,136(31)
add 10,8,10
lwz 10,0(10)
extsw 10,10
lwz 8,132(31)
mullw 10,8,10
extsw 8,10
lwz 10,124(31)
extsw 10,10
std 10,104(31)
lfd 0,104(31)
stfd 0,104(31)
ld 7,104(31)
sldi 10,7,2
ld 7,144(31)
add 10,7,10
lwz 10,0(10)
extsw 10,10
add 10,8,10
extsw 10,10
stw 10,0(9)
lwz 9,124(31)
addi 9,9,1
stw 9,124(31)
GCC with -O1 -fopenmp-simd
.L7:
lwz 9,108(31)
mtvsrwa 0,9
mfvsrd 8,0
sldi 9,8,2
ld 10,136(31)
add 9,10,9
lwz 10,108(31)
mtvsrwa 0,10
mfvsrd 7,0
sldi 10,7,2
ld 8,120(31)
add 10,8,10
lwz 10,0(10)
extsw 10,10
lwz 8,116(31)
mullw 10,8,10
extsw 8,10
lwz 10,108(31)
mtvsrwa 0,10
mfvsrd 7,0
sldi 10,7,2
ld 7,128(31)
add 10,7,10
lwz 10,0(10)
extsw 10,10
add 10,8,10
extsw 10,10
stw 10,0(9)
lwz 9,108(31)
addi 9,9,1
stw 9,108(31)
In order to clarify and understand details, I have one more application which is n^2 nbody application. This time my question is related with these compilers (gcc 4.9 and XL 13.1 ) and architectures (Intel and Power).
I put all the codes into gist https://gist.github.com/grypp/8b9f0f0f98af78f4223e#file-input-c ( full version of input code input.c )
- Power8 & XLC - It says "was not SIMD vectorized because it contains function calls. (there is sqrtf)". It's reasonable. But in the asm code I can see xsnmsubmdp is it normal? (the assembly: https://gist.github.com/grypp/8b9f0f0f98af78f4223e#file-power8-xlc-noinnersimd-asm)
- Power8 & gcc I tried to compile it in 2 ways (with omp simd construct and without). It changed my asm code, is it normal? (According to OpenMP, the code should not contain function call) (Assembilies: https://gist.github.com/grypp/8b9f0f0f98af78f4223e#file-power8-gcc-noinnersimd-asm & https://gist.github.com/grypp/8b9f0f0f98af78f4223e#file-power8-gcc-innersimd-asm)
- i74820K & gcc I did a same test with omp simd and without it. The output codes are different as well. Does FMA effect this code block ? (Assembilies: https://gist.github.com/grypp/8b9f0f0f98af78f4223e#file-i74820k-gcc-noinnersimd-asm & https://gist.github.com/grypp/8b9f0f0f98af78f4223e#file-i74820k-gcc-innersimd-asm)
Thanks in advance
The XL compiler on POWER Linux currently only supports a subset of the OpenMP 4.0 features. The SIMD construct feature is not supported at the moment, so the compiler will not recognize the construct in your source code.
However, if vectorization is what you're looking for then the good news is that the XL compiler should already automatically vectorize your code as long as you use at least the following optimization options
These options will enable high-order loop transformations along with POWER8 specific optimizations, including loop auto-vectorization for your loop.
Afterwards, you should see some VMX & VSX instructions in the generated assembly code similar to the following:
By the way, you can also get an optimization report from the XL compilers by using the -qreport option. This will explain which loops were vectorized and which loops were not and for what reason. e.g.
or
Hope this helps!