Clang ignores -mstack-alignment=XX flag

70 views Asked by At

This is related to an issue described in this question -- a reproducible example can be found there, as well as a description of the environment (briefly: Apple Silicon with macOS Sonoma and clang 15).

A hypothesized fix to the issue presented there would be to change the stack pointer alignment, for which there exists a clang option, -mstack-alignment, as well as the related option -mstackrealign.

Starting from the example given in the previous question, I have edited the Makefile and added -mstack-alignment=64 to CFLAGS. I saw no performance effect, no changes to the printed value of the stack pointer, and no changes to the generated assembly code. Thus I decided to try a simple experiment: compile with -mstack-alignment=16 (which I understand is the default requirement for the ARM ABI) and -mstack-alignment=64. Afterwards I compare the two binaries, and they're bit-for-bit identical.

My question is why is this option being ignored, and is there something I can do to trigger a stack alignment?

EDIT: as requested, here is a very small example:

#include <stdio.h>
#include <stdlib.h>

void* f() __attribute__((naked)) {
    asm volatile(
        "mov x0, sp\n\t"
        "ret\n\t"
    );
}

int main(int argc, char *argv[]) {
    int iters = atoi(argv[1]);

    printf("argv = %p argv[0] = %p sp = %p\n", (void*)argv, (void*)argv[0], (void*)__builtin_frame_address(0));
    
    for (int i = 0; i < iters; i++) {
        printf("sp = %p\n", f());
    }

    return 0;
}

Assuming it's saved as example.c, compile with clang -O3 -o example example.c -Wno-gcc-compat -mstack-alignment=64. Also try with 16 instead of 64, and compare the executables (they're bit-for-bit identical in my environment). It can be run with e.g. ./example 1. This is the output when running the code after compiling with a (supposedly) 64-byte stack alignment:

argv = 0x16ce17860 argv[0] = 0x16ce179d8 sp = 0x16ce17610
sp = 0x16ce175e0

As you can see (second line of the output), the actual sp when entering the function is aligned to 32 bytes, not 64. By changing the file name I also get an sp that is only aligned to 16 bytes.

This is the full objdump -d output for the binary:

example:    file format mach-o arm64

Disassembly of section __TEXT,__text:

0000000100003ed8 <_f>:
100003ed8: 910003e0     mov x0, sp
100003edc: d65f03c0     ret
100003ee0: d4200020     brk #0x1

0000000100003ee4 <_main>:
100003ee4: a9be4ff4     stp x20, x19, [sp, #-32]!
100003ee8: a9017bfd     stp x29, x30, [sp, #16]
100003eec: 910043fd     add x29, sp, #16
100003ef0: d10083e9     sub x9, sp, #32
100003ef4: 927df13f     and sp, x9, #0xfffffffffffffff8
100003ef8: aa0103f4     mov x20, x1
100003efc: f9400420     ldr x0, [x1, #8]
100003f00: 94000017     bl  0x100003f5c <_printf+0x100003f5c>
100003f04: aa0003f3     mov x19, x0
100003f08: f9400288     ldr x8, [x20]
100003f0c: a900f7e8     stp x8, x29, [sp, #8]
100003f10: f90003f4     str x20, [sp]
100003f14: 90000000     adrp    x0, 0x100003000 <_main+0x30>
100003f18: 913dd000     add x0, x0, #3956
100003f1c: 94000013     bl  0x100003f68 <_printf+0x100003f68>
100003f20: 7100067f     cmp w19, #1
100003f24: 5400012b     b.lt    0x100003f48 <_main+0x64>
100003f28: 90000014     adrp    x20, 0x100003000 <_main+0x44>
100003f2c: 913e5294     add x20, x20, #3988
100003f30: 97ffffea     bl  0x100003ed8 <_f>
100003f34: f90003e0     str x0, [sp]
100003f38: aa1403e0     mov x0, x20
100003f3c: 9400000b     bl  0x100003f68 <_printf+0x100003f68>
100003f40: 71000673     subs    w19, w19, #1
100003f44: 54ffff61     b.ne    0x100003f30 <_main+0x4c>
100003f48: 52800000     mov w0, #0
100003f4c: d10043bf     sub sp, x29, #16
100003f50: a9417bfd     ldp x29, x30, [sp, #16]
100003f54: a8c24ff4     ldp x20, x19, [sp], #32
100003f58: d65f03c0     ret

Disassembly of section __TEXT,__stubs:

0000000100003f5c <__stubs>:
100003f5c: b0000010     adrp    x16, 0x100004000 <__stubs+0x4>
100003f60: f9400210     ldr x16, [x16]
100003f64: d61f0200     br  x16
100003f68: b0000010     adrp    x16, 0x100004000 <__stubs+0x10>
100003f6c: f9400610     ldr x16, [x16, #8]
100003f70: d61f0200     br  x16

As you can see, there is some logic at address 0x100003ef4 to align sp to an 8-byte boundary, but I don't see anything else that might enforce the actually requested 64-byte alignment; and indeed, depending on the starting value of sp (which we've established depends on argv[0], i.e. the name of the executable), it is certainly possible (and indeed observed) that the requested alignment is not met.

EDIT: Godbolt link for the above. There are two compilation panes, one with -mstack-alignment=16 and the other with -mstack-alignment=64. As far as I can tell, the generated code is identical.

0

There are 0 answers