I have a problem which is bugging me for multiple days now...
I call a function from c which is implemented in arm assembler on a raspberry pi using the neon module. The signature looks like the following:
void doStuff(const uint32_t key[4])
I can load all the values into d-registers using VLD4.32 {d6-d9}, [r0]
.
The problem is that I have to use a value at a certain index of the array which is calculated at runtime. So I have to access the array at an index which I only know at runtime.
In c, the code I want to achieve would look like this:
// calculations
int i = ... // 'i' is the index of value in the array
int result = key[i];
In assembler I tried this:
VMOV r8, s22 ;@ copy the calculated index into an arm register
MOV r8, r8, LSL #0x2;@ multiply with 4
ADD r8, r5, r8 ;@ add offset to base adress
VLDR.32 d14, [r8] ;@ load from adress into d-register
I also tried multiplying with 2 and 32 instead of 4. But I always get the value 3.
I got it working with this stupid and very slow solution:
;@ <--- very slow and ugly --->
VLD4.32 {d6-d9}, [r1] ;@ load 4x32bit from adress *r1
VMOV r6, s22 ;@ r6 now contains the offset which is either 0,1,2 or 3
CMP r6, #0x0 ;@ 3 - 0 == 0 -> Z set
BEQ equal0
CMP r6, #0x1
BEQ equal1
CMP r6, #0x2
BEQ equal2
VMOV d12, d9 ;@ has to be 3
B continue
equal0:
VMOV d12, d6
B continue
equal1:
VMOV d12, d7
B continue
equal2:
VMOV d12, d8
B continue
continue:
;@ <--- --->
I basically have an if for every possible number and then select the corresponding register.
Thanks!
Edit:
Okay it works with VLD1.32 d14, [r8]
. Do not quite unterstand why it won't work with VLDR.32, though.