Execution freezes when I try to allocate Array in Armv8 assembly

252 views Asked by At

So I am programming in assemply, this is just a simple code so I can learn how to allocate arrays in order to use them on NEON programming later.

ASM_FUNC(FPE)
.data
.balign 8

array: .skip 80 
array1: .word 10,20,30,40

.text

ldr x0,=array
mov x1,#10

check: 
      cmp x1,#1
      bne loop
      b exit

loop:
      str x1,[x0],#8 //Stores the value in x1 into x0 and moves the address +8 bytes
      sub x1,x1,#1   //x1--
      b check


exit:
      mov x0,#11
          ret

So, some parts are commented so I could try to find where the code is breaking (I don't have debug on my system).
I started commenting the calculation part and added a mov x0,#11 in the end right before the ret to see if the problem was on the calculation. Turns out it was not. When I uncommented the array: .skip 80 and ldr x0,=array my application would just stick there if no response.

Can anyone please tell me what I am doing wrong? I am using A64 on armv8 assembly

The entry point is called from this c program:

void  PocAsm_EntryPoint ( )
    {
    
    
          Print(L"========== ASM ==========\n");
       
        
          UINT32 fff = FPE();
          Print(L" %d \n",fff);
        
          Print(L"=========== ASM ===========\n");
        
          Print(L"Test version 0.24 \n");
      return 0;
    }

Unfortunately I didn't find the definition of the Print, so I apologize

1

There are 1 answers

5
Frant On

This is an attempt to answer to the following question: does the FPE() function work as expected, while removing everything else from the equation, using standard tools such as qemu-system-aarch64 and GDB.

The code for the FPE() function will be compiled for a Cortex-A53 qemu-virt machine.

Prerequisites:

  • qemu-system-aarch64 is installed:

Ubuntu 20.04: sudo apt-get install qemu-system-arm
Windows 10: download and install the qemu-w64-setup-20201120.exe installer from here.

  • the aarch64-none-elf toolchain for Cortex-A is installed. It can be downloaded from the ARM WEB site. There are versions for both Linux and Windows 10.

FPE.s:

        .arch armv8-a
        .file   "FPE.s"

        .data
        .balign 8
        .globl array
array:  .skip 80 
array1: .word 10,20,30,40

        .text
        .align  2
        .globl FPE
FPE:
        ldr x0,=array
        mov x1,#10

check: 
        cmp x1,#1
        bne loop
        b exit

loop:
        str x1,[x0],#8  //Stores the value in x1 into x0 and moves the address +8 bits
        sub x1,x1,#1    //x1--
        b check

exit:
        mov x0,#11
        ret
        .end

startup.s:

                .title startup64.s
                .arch armv8-a
                .text
                .section .text.startup,"ax"    
                .globl _start
_start:
                ldr x0, =__StackTop
                mov sp, x0
                bl FPE
wait:           wfe
                b wait
               .end

Building:

We will build FPE.elf for the qemu-virt machine (RAM starts at 0x40000000):

/opt/arm/9/gcc-arm-9.2-2019.12-x86_64-aarch64-none-elf/bin/aarch64-none-elf-gcc -nostdlib -nostartfiles -ffreestanding -g -Wl,--defsym,__StackTop=0x40010000 -Wl,--section-start=.text=0x40000000 -o FPE.elf startup.s FPE.s

Debugging:

Start qemu in a shell:

/opt/qemu-5.1.0/bin/qemu-system-aarch64  -semihosting -m 1M -nographic -serial telnet::4444,server,nowait -machine virt,gic-version=2,secure=on,virtualization=on -S -gdb tcp::1234,ipv4 -cpu cortex-a53 -kernel FPE.elf

Start GDB:

opt/arm/9/gcc-arm-9.2-2019.12-x86_64-aarch64-none-elf/bin/aarch64-none-elf-gdb  --quiet -nx -ex 'target remote localhost:1234' -ex 'load' --ex 'b _start' -ex 'b exit' FPE.elf

GDB should start:

Reading symbols from FPE.elf...
Remote debugging using localhost:1234
_start () at startup.s:7
7                       ldr x0, =__StackTop
Loading section .text, size 0x50 lma 0x40000000
Loading section .data, size 0x60 lma 0x40010050
Start address 0x40000000, load size 176
Transfer rate: 85 KB/sec, 88 bytes/write.
Breakpoint 1 at 0x40000000: file startup.s, line 7.
Breakpoint 2 at 0x40000040: file FPE.s, line 28.

From this point, the commands stepi, p/x $x0, and x/10g 0x40010050 could be used for monitoring the program behavior until it will reach the exit label.

We will just here display the 10 elements in the array at the start and exit breakpoints:

gdb) x/10g 0x40010050
0x40010050:     0       0
0x40010060:     0       0
0x40010070:     0       0
0x40010080:     0       0
0x40010090:     0       0
(gdb) continue
Continuing.

Breakpoint 2, exit () at FPE.s:28
28              mov x0,#11
(gdb) x/10g 0x40010050
0x40010050:     10      9
0x40010060:     8       7
0x40010070:     6       5
0x40010080:     4       3
0x40010090:     2       0

Single-stepping from this point shows that the program returns properly from its execution:

(gdb) stepi
29              ret
(gdb) stepi
wait () at startup.s:10
10      wait:           wfe
(gdb) stepi
11                      b wait
(gdb) stepi
10      wait:           wfe

The answer to the question would therefore be: Yes, the code for the FPE() function is working properly.

The exact same procedure can be run on Windows 10, this is just a matter of adjusting the three commands that were used for running aarch64-none-elf-gcc, qemu-system-aarch64 and GDB.


Comparing a dump of your object file with the one I tested may help understanding the issue:

/opt.arm/9/gcc-arm-9.2-2019.12-x86_64-aarch64-none-elf/bin/aarch64-none-elf-as -o FPE.o FPE.s
/opt/arm/9/gcc-arm-9.2-2019.12-x86_64-aarch64-none-elf/bin/aarch64-none-elf-objdump -D FPE.o 

FPE.o:     file format elf64-littleaarch64


Disassembly of section .text:

0000000000000000 <FPE>:
   0:   58000140        ldr     x0, 28 <exit+0x8>
   4:   d2800141        mov     x1, #0xa                        // #10

0000000000000008 <check>:
   8:   f100043f        cmp     x1, #0x1
   c:   54000041        b.ne    14 <loop>  // b.any
  10:   14000004        b       20 <exit>

0000000000000014 <loop>:
  14:   f8008401        str     x1, [x0], #8
  18:   d1000421        sub     x1, x1, #0x1
  1c:   17fffffb        b       8 <check>

0000000000000020 <exit>:
  20:   d2800160        mov     x0, #0xb                        // #11
  24:   d65f03c0        ret
        ...

Disassembly of section .data:

0000000000000000 <array>:
        ...

0000000000000050 <array1>:
  50:   0000000a        .inst   0x0000000a ; undefined
  54:   00000014        .inst   0x00000014 ; undefined
  58:   0000001e        .inst   0x0000001e ; undefined
  5c:   00000028        .inst   0x00000028 ; undefined

Dumping the complete ELF file of the minimal example would give:

opt/arm/9/gcc-arm-9.2-2019.12-x86_64-aarch64-none-elf/bin/aarch64-none-elf-objdump -D FPE.elf

FPE.elf:     file format elf64-littleaarch64


Disassembly of section .text:

0000000040000000 <_start>:
    40000000:   580000c0        ldr     x0, 40000018 <wait+0xc>
    40000004:   9100001f        mov     sp, x0
    40000008:   94000006        bl      40000020 <FPE>

000000004000000c <wait>:
    4000000c:   d503205f        wfe
    40000010:   17ffffff        b       4000000c <wait>
    40000014:   00000000        .inst   0x00000000 ; undefined
    40000018:   40010000        .inst   0x40010000 ; undefined
    4000001c:   00000000        .inst   0x00000000 ; undefined

0000000040000020 <FPE>:
    40000020:   58000140        ldr     x0, 40000048 <exit+0x8>
    40000024:   d2800141        mov     x1, #0xa                        // #10

0000000040000028 <check>:
    40000028:   f100043f        cmp     x1, #0x1
    4000002c:   54000041        b.ne    40000034 <loop>  // b.any
    40000030:   14000004        b       40000040 <exit>

0000000040000034 <loop>:
    40000034:   f8008401        str     x1, [x0], #8
    40000038:   d1000421        sub     x1, x1, #0x1
    4000003c:   17fffffb        b       40000028 <check>

0000000040000040 <exit>:
    40000040:   d2800160        mov     x0, #0xb                        // #11
    40000044:   d65f03c0        ret
    40000048:   40010050        .inst   0x40010050 ; undefined
    4000004c:   00000000        .inst   0x00000000 ; undefined

Disassembly of section .data:

0000000040010050 <__data_start>:
        ...

00000000400100a0 <array1>:
    400100a0:   0000000a        .inst   0x0000000a ; undefined
    400100a4:   00000014        .inst   0x00000014 ; undefined
    400100a8:   0000001e        .inst   0x0000001e ; undefined
    400100ac:   00000028        .inst   0x00000028 ; undefined