int 0x80 on Linux always invokes the 32-bit ABI, regardless of what mode it's called from: args in ebx, ecx, ... and syscall numbers from /usr/include/asm/unistd_32.h. (Or crashes on 64-bit kernels compiled without CONFIG_IA32_EMULATION).
64-bit code should use syscall, with call numbers from /usr/include/asm/unistd_64.h, and args in rdi, rsi, etc. See What are the calling conventions for UNIX & Linux system calls on i386 and x86-64. If your question was marked a duplicate of this, see that link for details on how you should make system calls in 32 or 64-bit code. If you want to understand what exactly happened, keep reading.
(For an example of 32-bit vs. 64-bit sys_write, see Using interrupt 0x80 on 64-bit Linux)
syscall system calls are faster than int 0x80 system calls, so use native 64-bit syscall unless you're writing polyglot machine code that runs the same when executed as 32 or 64 bit. (sysenter always returns in 32-bit mode, so it's not useful from 64-bit userspace, although it is a valid x86-64 instruction.)
Related: The Definitive Guide to Linux System Calls (on x86) for how to make int 0x80 or sysenter 32-bit system calls, or syscall 64-bit system calls, or calling the vDSO for "virtual" system calls like gettimeofday. Plus background on what system calls are all about.
Using int 0x80 makes it possible to write something that will assemble in 32 or 64-bit mode, so it's handy for an exit_group() at the end of a microbenchmark or something.
Current PDFs of the official i386 and x86-64 System V psABI documents that standardize function and syscall calling conventions are linked from https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI.
See the x86 tag wiki for beginner guides, x86 manuals, official documentation, and performance optimization guides / resources.
But since people keep posting questions with code that uses int 0x80 in 64-bit code, or accidentally building 64-bit binaries from source written for 32-bit, I wonder what exactly does happen on current Linux?
Does int 0x80 save/restore all the 64-bit registers? Does it truncate any registers to 32-bit? What happens if you pass pointer args that have non-zero upper halves?
Does it work if you pass it 32-bit pointers?
TL:DR:
int 0x80works when used correctly, as long as any pointers fit in 32 bits (stack pointers don't fit). But beware thatstracedecodes it wrong unless you have a very recent strace + kernel.int 0x80zeros r8-r11 for reasons, and preserves everything else. Use it exactly like you would in 32-bit code, with the 32-bit call numbers. (Or better, don't use it!)Not all systems even support
int 0x80: The Windows Subsystem for Linux version 1 (WSL1) is strictly 64-bit only:int 0x80doesn't work at all. It's also possible to build Linux kernels without IA-32 emulation either. (No support for 32-bit executables, no support for 32-bit system calls). See this re: making sure your WSL is actually WSL2 (which uses an actual Linux kernel in a VM.)The details: what's saved/restored, which parts of which regs the kernel uses
int 0x80useseax(not the fullrax) as the system-call number, dispatching to the same table of function-pointers that 32-bit user-spaceint 0x80uses. (These pointers are tosys_whateverimplementations or wrappers for the native 64-bit implementation inside the kernel. System calls are really function calls across the user/kernel boundary.)Only the low 32 bits of arg registers are passed. The upper halves of
rbx-rbpare preserved, but ignored byint 0x80system calls. Note that passing a bad pointer to a system call doesn't result in SIGSEGV; instead the system call returns-EFAULT. If you don't check error return values (with a debugger or tracing tool), it will appear to silently fail.All registers (except eax of course) are saved/restored (including RFLAGS, and the upper 32 of integer regs), except that r8-r11 are zeroed.
r12-r15are call-preserved in the x86-64 SysV ABI's function calling convention, so the registers that get zeroed byint 0x80in 64-bit are the call-clobbered subset of the "new" registers that AMD64 added.This behaviour has been preserved over some internal changes to how register-saving was implemented inside the kernel, and comments in the kernel mention that it's usable from 64-bit, so this ABI is probably stable. (I.e. you can count on r8-r11 being zeroed, and everything else being preserved.)
The return value is sign-extended to fill 64-bit
rax. (Linux declares 32-bit sys_ functions as returning signedlong.) This means that pointer return values (like fromvoid *mmap()) need to be zero-extended before use in 64-bit addressing modesUnlike
sysenter, it preserves the original value ofcs, so it returns to user-space in the same mode that it was called in. (Usingsysenterresults in the kernel settingcsto$__USER32_CS, which selects a descriptor for a 32-bit code segment.)Older
stracedecodesint 0x80incorrectly for 64-bit processes. It decodes as if the process had usedsyscallinstead ofint 0x80. This can be very confusing. e.g.straceprintswrite(0, NULL, 12 <unfinished ... exit status 1>foreax=1/int $0x80, which is actually_exit(ebx), notwrite(rdi, rsi, rdx).I don't know the exact version where the
PTRACE_GET_SYSCALL_INFOfeature was added, but Linux kernel 5.5 / strace 5.5 handle it. It misleadingly says the process "runs in 32-bit mode" but does decode correctly. (Example).int 0x80works as long as all arguments (including pointers) fit in the low 32 of a register. This is the case for static code and data in the default code model ("small") in the x86-64 SysV ABI. (Section 3.5.1 : all symbols are known to be located in the virtual addresses in the range0x00000000to0x7effffff, so you can do stuff likemov edi, hello(AT&Tmov $hello, %edi) to get a pointer into a register with a 5 byte instruction).But this is not the case for position-independent executables, which many Linux distros now configure
gccto make by default (and they enable ASLR for executables). For example, I compiled ahello.con Arch Linux, and set a breakpoint at the start of main. The string constant passed toputswas at0x555555554724, so a 32-bit ABIwritesystem call would not work. (GDB disables ASLR by default, so you always see the same address from run to run, if you run from within GDB.)Linux puts the stack near the "gap" between the upper and lower ranges of canonical addresses, i.e. with the top of the stack at 2^48-1. (Or somewhere random, with ASLR enabled). So
rspon entry to_startin a typical statically-linked executable is something like0x7fffffffe550, depending on size of env vars and args. Truncating this pointer toespdoes not point to any valid memory, so system calls with pointer inputs will typically return-EFAULTif you try to pass a truncated stack pointer. (And your program will crash if you truncatersptoespand then do anything with the stack, e.g. if you built 32-bit asm source as a 64-bit executable.)How it works in the kernel:
In the Linux source code,
arch/x86/entry/entry_64_compat.SdefinesENTRY(entry_INT80_compat). Both 32 and 64-bit processes use the same entry point when they executeint 0x80.entry_64.Sis defines native entry points for a 64-bit kernel, which includes interrupt / fault handlers andsyscallnative system calls from long mode (aka 64-bit mode) processes.entry_64_compat.Sdefines system-call entry-points from compat mode into a 64-bit kernel, plus the special case ofint 0x80in a 64-bit process. (sysenterin a 64-bit process may go to that entry point as well, but it pushes$__USER32_CS, so it will always return in 32-bit mode.) There's a 32-bit version of thesyscallinstruction, supported on AMD CPUs, and Linux supports it too for fast 32-bit system calls from 32-bit processes.I guess a possible use-case for
int 0x80in 64-bit mode is if you wanted to use a custom code-segment descriptor that you installed withmodify_ldt.int 0x80pushes segment registers itself for use withiret, and Linux always returns fromint 0x80system calls viairet. The 64-bitsyscallentry point setspt_regs->csand->ssto constants,__USER_CSand__USER_DS. (It's normal that SS and DS use the same segment descriptors. Permission differences are done with paging, not segmentation.)entry_32.Sdefines entry points into a 32-bit kernel, and is not involved at all.The code zero-extends eax into rax, then pushes all the registers onto the kernel stack to form a
struct pt_regs. This is where it will restore from when the system call returns. It's in a standard layout for saved user-space registers (for any entry point), soptracefrom other process (like gdb orstrace) will read and/or write that memory if they useptracewhile this process is inside a system call. (ptracemodification of registers is one thing that makes return paths complicated for the other entry points. See comments.)But it pushes
$0instead of r8/r9/r10/r11. (sysenterand AMDsyscall32entry points store zeros for r8-r15.)I think this zeroing of r8-r11 is to match historical behaviour. Before the Set up full pt_regs for all compat syscalls commit, the entry point only saved the C call-clobbered registers. It dispatched directly from asm with
call *ia32_sys_call_table(, %rax, 8), and those functions follow the calling convention, so they preserverbx,rbp,rsp, andr12-r15. Zeroingr8-r11instead of leaving them undefined was to avoid info leaks from a 64-bit kernel to 32-bit user-space (which could far jmp to a 64-bit code segment to read anything the kernel left there).The current implementation (Linux 4.12) dispatches 32-bit-ABI system calls from C, reloading the saved
ebx,ecx, etc. frompt_regs. (64-bit native system calls dispatch directly from asm, with only amov %r10, %rcxneeded to account for the small difference in calling convention between functions andsyscall. Unfortunately it can't always usesysret, because CPU bugs make it unsafe with non-canonical addresses. It does try to, so the fast-path is pretty damn fast, althoughsyscallitself still takes tens of cycles.)Anyway, in current Linux, 32-bit syscalls (including
int 0x80from 64-bit) eventually end up indo_syscall_32_irqs_on(struct pt_regs *regs). It dispatches to a function pointeria32_sys_call_table, with 6 zero-extended args. This maybe avoids needing a wrapper around the 64-bit native syscall function in more cases to preserve that behaviour, so more of theia32table entries can be the native system call implementation directly.In older versions of Linux that dispatch 32-bit system calls from asm (like 64-bit still did until 4.151), the int80 entry point itself puts args in the right registers with
movandxchginstructions, using 32-bit registers. It even usesmov %edx,%edxto zero-extend EDX into RDX (because arg3 happen to use the same register in both conventions). code here. This code is duplicated in thesysenterandsyscall32entry points.Footnote 1: Linux 4.15 (I think) introduced Spectre / Meltdown mitigations, and a major revamp of the entry points that made them them a trampoline for the meltdown case. It also sanitized the incoming registers to avoid user-space values other than actual args being in registers during the call (when some Spectre gadget might run), by storing them, zeroing everything, then calling to a C wrapper that reloads just the right widths of args from the struct saved on entry.
I'm planning to leave this answer describing the much simpler mechanism because the conceptually useful part here is that the kernel side of a syscall involves using EAX or RAX as an index into a table of function pointers, with other incoming register values copied going to the places where the calling convention wants args to go. i.e.
syscallis just a way to make a call into the kernel, to its dispatch code.Simple example / test program:
I wrote a simple Hello World (in NASM syntax) which sets all registers to have non-zero upper halves, then makes two
write()system calls withint 0x80, one with a pointer to a string in.rodata(succeeds), the second with a pointer to the stack (fails with-EFAULT).Then it uses the native 64-bit
syscallABI towrite()the chars from the stack (64-bit pointer), and again to exit.So all of these examples are using the ABIs correctly, except for the 2nd
int 0x80which tries to pass a 64-bit pointer and has it truncated.If you built it as a position-independent executable, the first one would fail too. (You'd have to use a RIP-relative
leainstead ofmovto get the address ofhello:into a register.)I used gdb, but use whatever debugger you prefer. Use one that highlights changed registers since the last single-step.
gdbguiworks well for debugging asm source, but is not great for disassembly. Still, it does have a register pane that works well for integer regs at least, and it worked great on this example.See the inline
;;;comments describing how register are changed by system callsBuild it into a 64-bit static binary with
Run
gdb ./abi32-from-64. Ingdb, runset disassembly-flavor intelandlayout regif you don't have that in your~/.gdbinitalready. (GAS.intel_syntaxis like MASM, not NASM, but they're close enough that it's easy to read if you like NASM syntax.)Press control-L when gdb's TUI mode gets messed up. This happens easily, even when programs don't print to stdout themselves.