This is related to an issue described in this question -- a reproducible example can be found there, as well as a description of the environment (briefly: Apple Silicon with macOS Sonoma and clang 15).
A hypothesized fix to the issue presented there would be to change the stack pointer alignment, for which there exists a clang option, -mstack-alignment
, as well as the related option -mstackrealign
.
Starting from the example given in the previous question, I have edited the Makefile and added -mstack-alignment=64
to CFLAGS
. I saw no performance effect, no changes to the printed value of the stack pointer, and no changes to the generated assembly code. Thus I decided to try a simple experiment: compile with -mstack-alignment=16
(which I understand is the default requirement for the ARM ABI) and -mstack-alignment=64
. Afterwards I compare the two binaries, and they're bit-for-bit identical.
My question is why is this option being ignored, and is there something I can do to trigger a stack alignment?
EDIT: as requested, here is a very small example:
#include <stdio.h>
#include <stdlib.h>
void* f() __attribute__((naked)) {
asm volatile(
"mov x0, sp\n\t"
"ret\n\t"
);
}
int main(int argc, char *argv[]) {
int iters = atoi(argv[1]);
printf("argv = %p argv[0] = %p sp = %p\n", (void*)argv, (void*)argv[0], (void*)__builtin_frame_address(0));
for (int i = 0; i < iters; i++) {
printf("sp = %p\n", f());
}
return 0;
}
Assuming it's saved as example.c
, compile with clang -O3 -o example example.c -Wno-gcc-compat -mstack-alignment=64
. Also try with 16 instead of 64, and compare the executables (they're bit-for-bit identical in my environment). It can be run with e.g. ./example 1
. This is the output when running the code after compiling with a (supposedly) 64-byte stack alignment:
argv = 0x16ce17860 argv[0] = 0x16ce179d8 sp = 0x16ce17610
sp = 0x16ce175e0
As you can see (second line of the output), the actual sp
when entering the function is aligned to 32 bytes, not 64. By changing the file name I also get an sp
that is only aligned to 16 bytes.
This is the full objdump -d
output for the binary:
example: file format mach-o arm64
Disassembly of section __TEXT,__text:
0000000100003ed8 <_f>:
100003ed8: 910003e0 mov x0, sp
100003edc: d65f03c0 ret
100003ee0: d4200020 brk #0x1
0000000100003ee4 <_main>:
100003ee4: a9be4ff4 stp x20, x19, [sp, #-32]!
100003ee8: a9017bfd stp x29, x30, [sp, #16]
100003eec: 910043fd add x29, sp, #16
100003ef0: d10083e9 sub x9, sp, #32
100003ef4: 927df13f and sp, x9, #0xfffffffffffffff8
100003ef8: aa0103f4 mov x20, x1
100003efc: f9400420 ldr x0, [x1, #8]
100003f00: 94000017 bl 0x100003f5c <_printf+0x100003f5c>
100003f04: aa0003f3 mov x19, x0
100003f08: f9400288 ldr x8, [x20]
100003f0c: a900f7e8 stp x8, x29, [sp, #8]
100003f10: f90003f4 str x20, [sp]
100003f14: 90000000 adrp x0, 0x100003000 <_main+0x30>
100003f18: 913dd000 add x0, x0, #3956
100003f1c: 94000013 bl 0x100003f68 <_printf+0x100003f68>
100003f20: 7100067f cmp w19, #1
100003f24: 5400012b b.lt 0x100003f48 <_main+0x64>
100003f28: 90000014 adrp x20, 0x100003000 <_main+0x44>
100003f2c: 913e5294 add x20, x20, #3988
100003f30: 97ffffea bl 0x100003ed8 <_f>
100003f34: f90003e0 str x0, [sp]
100003f38: aa1403e0 mov x0, x20
100003f3c: 9400000b bl 0x100003f68 <_printf+0x100003f68>
100003f40: 71000673 subs w19, w19, #1
100003f44: 54ffff61 b.ne 0x100003f30 <_main+0x4c>
100003f48: 52800000 mov w0, #0
100003f4c: d10043bf sub sp, x29, #16
100003f50: a9417bfd ldp x29, x30, [sp, #16]
100003f54: a8c24ff4 ldp x20, x19, [sp], #32
100003f58: d65f03c0 ret
Disassembly of section __TEXT,__stubs:
0000000100003f5c <__stubs>:
100003f5c: b0000010 adrp x16, 0x100004000 <__stubs+0x4>
100003f60: f9400210 ldr x16, [x16]
100003f64: d61f0200 br x16
100003f68: b0000010 adrp x16, 0x100004000 <__stubs+0x10>
100003f6c: f9400610 ldr x16, [x16, #8]
100003f70: d61f0200 br x16
As you can see, there is some logic at address 0x100003ef4
to align sp
to an 8-byte boundary, but I don't see anything else that might enforce the actually requested 64-byte alignment; and indeed, depending on the starting value of sp
(which we've established depends on argv[0]
, i.e. the name of the executable), it is certainly possible (and indeed observed) that the requested alignment is not met.
EDIT: Godbolt link for the above. There are two compilation panes, one with -mstack-alignment=16
and the other with -mstack-alignment=64
. As far as I can tell, the generated code is identical.