I'm suffering GCC inline assembly on PowerPC. The program compiles fine with -g2 -O3, but fails to compile with -g3 -O0. The problem is, I need to observe it under the debugger so I need symbols without optimizations.
Here is the program:
$ cat test.cxx
#include <altivec.h>
#undef vector
typedef __vector unsigned char uint8x16_p;
uint8x16_p VectorFastLoad8(const void* p)
{
long offset = 0;
uint8x16_p res;
__asm(" lxvd2x %x0, %1, %2 \n\t"
: "=wa" (res)
: "g" (p), "g" (offset/4), "Z" (*(const char (*)[16]) p));
return res;
}
And here's the error. (The error has existed since PowerPC vec_xl_be replacement using inline assembly, but I have been able to ignore it until now).
$ g++ -g3 -O0 -mcpu=power8 test.cxx -c
/home/test/tmp/ccWvBTN4.s: Assembler messages:
/home/test/tmp/ccWvBTN4.s:31: Error: operand out of range (64 is not between 0 and 31)
/home/test/tmp/ccWvBTN4.s:31: Error: syntax error; found `(', expected `,'
/home/test/tmp/ccWvBTN4.s:31: Error: junk at end of line: `(31),32(31)'
I believe this is the sore spot from the *.s listing:
#APP
# 12 "test.cxx" 1
lxvd2x 0, 64(31), 32(31)
There's some similar issues reported when using lwz, but I have not found one discussing problems with lxvd2x.
What is the problem and how do I fix it?
Here's the head of the *.s file:
$ head -n 40 test.s
.file "test.cxx"
.abiversion 2
.section ".toc","aw"
.align 3
.section ".text"
.machine power8
.Ltext0:
.align 2
.globl _Z15VectorFastLoad8PKv
.type _Z15VectorFastLoad8PKv, @function
_Z15VectorFastLoad8PKv:
.LFB0:
.file 1 "test.cxx"
.loc 1 7 0
.cfi_startproc
std 31,-8(1)
stdu 1,-96(1)
.cfi_def_cfa_offset 96
.cfi_offset 31, -8
mr 31,1
.cfi_def_cfa_register 31
std 3,64(31)
.LBB2:
.loc 1 8 0
li 9,0
std 9,32(31)
.loc 1 12 0
ld 9,64(31)
#APP
# 12 "test.cxx" 1
lxvd2x 0, 64(31), 32(31)
# 0 "" 2
#NO_APP
xxpermdi 0,0,0,2
li 9,48
stxvd2x 0,31,9
.loc 1 13 0
li 9,48
lxvd2x 0,31,9
Here's the code generated at -O3:
$ g++ -g3 -O3 -mcpu=power8 test.cxx -save-temps -c
$ objdump --disassemble test.o | c++filt
test.o: file format elf64-powerpcle
Disassembly of section .text:
0000000000000000 <VectorFastLoad8(void const*)>:
0: 99 06 43 7c lxvd2x vs34,r3,r0
4: 20 00 80 4e blr
8: 00 00 00 00 .long 0x0
c: 00 09 00 00 .long 0x900
10: 00 00 00 00 .long 0x0
The issue is that the generated asm has register+offset operands for RA and RB, but the
lxvd2xinstruction only takes direct register addresses (ie, no offsets).It looks like you've got your constraints wrong there. Looking at the inline asm:
Firstly, you have one output operand and three input operands (so four in total), but only three operands used in your template.
I'm assuming that your function reads directly from
*p, and it doesn't clobber anything, so it looks like this is an unused operand for indicating a potential memory access (more on that below). We'll keep it simple for now; dropping it gives us:Compiling that, I still get an offset used for the RA and/or RB:
Looking at the docs for the
"g"constraint, we see:However, we can't provide a memory operand here; only a register (without offset) is allowed. If we change the constraint to
"r":For me, this compiles to a valid
lxvd2xinvocation:- which the assembler happily accepts.
Now, as @PeterCordes has commented, this example no longer indicates that it may access memory, so we should restore that memory input dependency, giving:
In effect, all we've done is alter the constraints from
"g"to"r", forcing the compiler to use non-offset register operands.