I'm trying to get an existing JIT working on Windows x86_64 using mingw64.
I'm getting segfaults when the JIT calls back into precompiled code, and that code calls Windows APIs, because aligned move instructions such as movaps
within the Windows API implementations are being called with %rsp
not a multiple of 16, i.e. the stack isn't aligned to a 16-byte boundary.
Thread 1 hit Catchpoint 2 (signal SIGSEGV), 0x00007fff5865142d in KERNELBASE!FindFirstFileA () from C:\WINDOWS\System32\KernelBase.dll
1: x/i $pc
=> 0x7fff5865142d <KERNELBASE!FindFirstFileA+125>: movaps 0x60(%rsp),%xmm0
2: /x $rsp = 0xd8edd8
In what I was expecting to be a quick workaround, I thought I would get gcc to force a realignment of the stack on the way into the precompiled functions that are called by the JIT code and ultimately call Windows API functions.
The gcc docs for the force_align_arg_pointer
attribute:
On x86 targets, the
force_align_arg_pointer
attribute may be applied to individual function definitions, generating an alternate prologue and epilogue that realigns the run-time stack if necessary. This supports mixing legacy codes that run with a 4-byte aligned stack with modern codes that keep a 16-byte stack for SSE compatibility.
However adding __attribute__((force_align_arg_pointer))
to the function specifiers had no effect on the output assembly.
I also tried -mpreferred-stack-boundary=4
, which explicitly requests 2**4 == 16
alignment for all functions:
-mpreferred-stack-boundary=num
Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary.
This also had no effect.
In fact, the first thing I found that did affect the output assembly was -mpreferred-stack-boundary=3
(which should keep the stack aligned to an 8-byte boundary).
That resulted in this difference:
@@ -46,8 +59,15 @@
.def foo; .scl 2; .type 32; .endef
.seh_proc foo
foo:
+ pushq %rbp
+ .seh_pushreg %rbp
+ movq %rsp, %rbp
+ .seh_setframe %rbp, 0
+ andq $-16, %rsp
.seh_endprologue
leaq .LC0(%rip), %rcx
+ movq %rbp, %rsp
+ popq %rbp
jmp printf
.seh_endproc
.def __main; .scl 2; .type 32; .endef
Strangely this is actually putting in andq $-16, %rsp
(aligning the stack pointer to a multiple of 16) despite the fact we said to prefer 8 byte alignment.
What am I misunderstanding about these options or the cases they work in?
The version of gcc is MSYS2 mingw64's 10.2.0:
$ gcc --version
gcc.exe (Rev4, Built by MSYS2 project) 10.2.0
The correct workaround would be
-mincoming-stack-boundary=3
: you should be telling the compiler that the function it compiles may be called with under-aligned stack (hence "incoming" rather than "preferred": you don't need to raise the preferred alignment above the default).As to why the attribute doesn't work, it seems you've found a compiler backend bug specific to 64-bit Microsoft ABI. The attribute works as you would expect when targeting Linux, but there's some special-casing for Microsoft (and Apple) ABIs in the backend, and it's possible the code does not align with the intended behavior:
(note how the comment refers to the attribute, but the code evidently does not work that way)