My solution has an unmanaged C++ DLL, which exports a function, and a managed application that PInvokes this function.
I've just converted the solution from .NET 3.5 to .NET 4.0 and got this PInvokeStackImbalance "A call to PInvoke function [...] has unbalanced the stack" exception. As it turned out, I was calling __cdecl'ed function, as it was __stdcall:
C++ part (callee):
__declspec(dllexport) double TestFunction(int param1, int param2); // by default is __cdecl
C# part (caller):
[DllImport("TestLib.dll")] // by default is CallingConvention.StdCall
private static extern double TestFunction(int param1, int param2);
So, I've fixed the bug, but now I'm interested in how did this work in .NET 3.5? Why the (many times repeated) situation when nobody (neither callee, nor caller) cleans the stack, did not caused stack overflow or some other misbehavior, but just worked OK? Is there some sort of a check in PInvoke, like mentioned by Raymond Chen in his article? It's also interesting, why the opposite type of breaking convention (having __stdcall callee be PInvoked like being __cdecl) is not working at all, causing just EntryPointNotFoundException.
After some investigation:
The helper, that saves the situation from crashing, is another register - EBP, base pointer that points to the beginning of stack frame. All access to function's local variables is done through this pointer (except for optimized code, see the edit below). Before the function returns, the stack pointer is reset to the base pointer's value.
Before a function (say PInvoke) calls another function (imported DLL's function), the stack pointer points to the end of the caller function's local variables. Then the caller pushes parameters to the stack and calls that other function.
In the described situation, when a function calls another function as being __stdcall, while it is actually __cdecl, nobody clears the stack from these parameters. So, after return from the callee, the stack pointer points to the end of the pushed parameters block. It is like the caller function (PInvoke) just got several more local variables.
Since access to the caller's local variables is done through the base pointer, it does not break anything. The only bad thing that may happen, is if the callee function will be called many times at once. In this case the stack will grow and may overflow. But since PInvoke calls the DLL's function only once, and then returns, the stack pointer just resets to the base pointer, and all is well. Edit: As noted here, the code may also be optimized to store local variables in CPU registers only. In this case EBP is not used and thus invalid ESP may cause returning to invalid address.