According to wikipedia: "Microsoft or GCC __fastcall convention (aka __msfastcall) passes the first two arguments (evaluated left to right) that fit into ECX and EDX. Remaining arguments are pushed onto the stack from right to left."
Why did they decide against using EAX, ECX, EDX for arg0, arg1, arg2? If they're going to push arguments into the registers, why stop at 2? I know Borland's fast all DOES do this, so did microsoft choose not to use EAX just to be different?
Due to limitations of x86 commands set, there is no
CALL immediate
command, there is
CALL IP.offset,
which is relative. Of course, compiler writers would like to reserve something with absolute offset, and there could be a request from processor makers to do so, hence we have the following "compromise":
MOV eax, absolute_address(label)
CALL eax
which would be equal to
CALL absolute_address(label)
Such calling method would require 1 temporary register, just for call, which can be easily reused, and EAX is the best option for such purpose.
The result of such considerations is useful, and you can use it in your asm code. The benchmarks show that the branch predictors would, at least partially, work properly with such code. Another possibility, but a rare one, is when you have to reset flags to avoid dependency penalty or partial register stall when calling procedure which is already in top level cache. It would happen, for example, if you use something like mov ah, 1 at the very beginning of your subroutine. To avoid that, use EAX as a temporary register and put
XOR eax, eax
immediately before CALL. It may actually save a few clock cycles in some rare cases. But the actual benefit of doing that vs. using it for parameter passing is doubtful and the reasons for that may be as stated above.