C calling convention: who cleans the stack in variadic functions vs normal functions?

1.1k views Asked by At

There are some calling conventions (e.g pascal, stdcall) but as far as I am concerned, C does use cdecl (C-declared). Each of these conventions are slightly different in the way the caller loads the parameters onto the stack, respectively which (caller / callee) does the cleanup.

Talking about the cleanup, here is my question. I do not understand: are there three different things?

  1. stack clean
  2. moving the pointer back to the penultimate stack frame
  3. stack restoration

Or how should I see them?

Also, the target of this question is basically how could variadic function works in calling conventions like Pascal or stdcall where the callee should clear / clean / restore (I don't know which operation) the stack - but he doesn't know how many parameters it will receive.

EDIT

Why is it so important the order in which parameters are pushed on to the stack? You still have the first parameter (stable parameter not from ellipsis) which gives you the information about -for example- the number of variable arguments. And there is also the "guardian" which can be added into ellipsis punctuator and can be used as a marker for the variable part's end independent on the calling convention. In this link why both caller and callee should restore values of those register if they both save their state before messing them up? Shouldn't only one of them (e.g caller) save them on the stack before calling the function and that's all ? Also, on the same link

"So, the stack pointer ESP might go up and down, but the EBP register remains fixed. This is convenient because it means we can always refer to the first argument as [EBP + 8] regardless of how much pushing and popping is done in the function."

The pushed variables and the local variables are consecutive in memory. Where is the advantage of referring them using the EBP ? They will never have some dynamic offset between them, even if the stack changes in size.

One of the materials I've read is this site (only the beginning) for a better understanding on what is exactly the stack frame. Then I went on yt and found these stack overview and call stack tutorials but they somehow missed the part I needed. What does exactly happends when you call the function (I don't understand the instruction "call address" followed by the next instruction a push value on to the stack that means the return value). Who controls what the return address will be ? The caller? the callee? When the callee returns, the program contiunes by executing an instruction which is a reading operation from a register or what ?

2

There are 2 answers

1
John Bollinger On BEST ANSWER

as far as I am concerned, C does use cdecl

Its name notwithstanding, the cdecl convention is not universal for C code, not even on the x86 architecture. It has the advantage of being simple to define and implement, but it makes no use of CPU registers for argument passing, which is more efficient. That makes a difference even on register-starved x86, but it makes a lot more difference on architectures with more available registers, such as x86_64.

Talking about the cleanup, here is my question. I do not understand: are there three different things?

  1. stack clean
  2. moving the pointer back to the penultimate stack frame
  3. stack restoration

Or how should I see them?

I would be inclined to interpret (1) and (3) as different ways of saying the same thing, but it is conceivable that someone would draw distinctions between them. (3) and related wording is what I encounter most frequently. (2) is not necessarily the same thing, because there may be two relevant stack parameters to be restored: the base of the stack frame (see below), and the top of the stack. The stack frame base is important in the event that the stack frame contains more information than argument and local variable values, such as the base of the previous stack frame.

Also, the target of this question is basically how could variadic function works in calling conventions like Pascal or stdcall where the callee should clear / clean / restore (I don't know which operation) the stack - but he doesn't know how many parameters it will receive.

The stack is not necessarily the whole picture.

The callee cannot restore the stack if it does not know how to find the top of its caller's stack, and, if necessary, the base of its caller's stack frame. But in practice, this is usually hardware assisted.

Taking x86 (for which cdecl was designed) as an example, the CPU has registers for both the stack (frame) base and the current stack pointer. The caller's stack base is stored on the stack at a known offset (0) from the callee's stack base. Regardless of the number of arguments, the callee restores the stack by moving the top of the stack to its own stack base, and popping the value there to obtain the caller's stack base.

It is conceivable, however, that there is a calling convention in use somewhere that affords no way to restore the stack to a chosen previous state other than to pop elements one at a time, that does not explicitly convey the number of arguments to the called function, and that requires the callee to restore the caller's stack. Such a calling convention would not support variadic functions.

Why is it so important the order in which parameters are pushed on to the stack?

The order is not important in any general sense, but it is essential for caller and callee, which may be compiled separately, to agree about it. Otherwise, the callee cannot match the passed values with the parameters they are intended for. Thus, to whatever extent a calling convention relies on the stack, it must specify precisely which arguments are passed there, and in which order.

Regarding stack frames: this is more material that is not specified by C and that varies, at least to some extent. Conceptually, though, the stack frame of a function call is the portion of the stack that provides execution context for that call. It typically supplies storage for local variables, and it may contain additional information, such as a return address and / or the value of the caller's stack frame pointer. It might also contain other per-function-call information appropriate for the execution environment. Details are part of the calling convention in use.

1
Peter Cordes On

Note that in practice no mainstream systems ever use callee-pops-args conventions for variadic functions. They all use caller-pops, so the callee doesn't need to know the number of args. It would not be impossible to do callee-pops, but it would generally not be worth the trouble.

For example in 32-bit code for Windows, I think stdcall is the default for many Windows DLL functions, but variadic ones use cdecl. (Non-Windows x86 systems like Linux and MacOS typically use caller-pops calling conventions by default, for all functions. So this really only comes up for 32-bit Windows if we're talking about mainstream systems.)

So printf doesn't have to count up the size of args referenced by the format string (or receive a count passed by the caller) and then emulate a ret 12 or ret 8 or whatever. ret n is only available in machine code with an immediate operand so you can't do ret ecx or something. It's possible to emulate a variable-count ret n various ways, e.g. one of the least bad would be copying the return address higher up on the stack and adjusting ESP before a plain ret. But that's still pretty inefficient compared to just using a caller-pops convention.

Also, that would break some valid C programs: passing an unused arg to printf is not UB in ISO C; it's required to be safely ignored. Some code presumably depends on it being silently ignored. (By accident or because the types mismatched, or because they always pass some args, but pass a different format string that might print a simpler or more complex output.)

Windows also makes sure caller and callee agree on how much stack space the callee will pop by "decorating" asm symbol names like _foo@12 for a function like int foo(int, int, int). (Three int args = 12 bytes of stack space for a pure stack-args convention). So if you declare it wrong (or don't declare it at all, and the implicit declaration uses larger types), you'll get a link error instead of a hard-to-debug error which might only happen in optimized builds. (If a debug build using EBP as a frame pointer happens to correct for the stack mismatch before anything can go wrong.)

Calling convention mismatch and other asm bugs lead to breakage "below" the C / C++ level, and can be very hard to debug, especially for people that are only looking at C variables in a debugger or with debug-prints. (Same thing for misuse of GNU C inline asm.)


As @johnfound said, the key point with calling conventions is that caller and callee agree on the rules. Any unambiguous set of rules works as long as both parties agree.

Good (efficient) calling conventions (e.g. x86-64 System V, and to a lesser extent Windows x64 and 32-bit fastcall/vectorcall) will pass the first few args in registers, avoiding the store/reload to the stack or any stack manipulation for simple functions. Efficient calling conventions also have a good mix of call-preserved and call-clobbered registers. Simple calling conventions pass everything on the stack, with caller or callee responsible for popping the args. Even simpler ones (like Irvine32 for asm beginners) preserve all the registers.

For good details, see Agner Fog's calling conventions guide.