Why does some Windows booloader code zero registers with `sub` instead of `xor`?

Question

Why does some Windows booloader code zero registers with `sub` instead of `xor`?

124 views Asked by MichaelK At 20 January 2021 at 22:15

Given considerations such as detailed in https://stackoverflow.com/a/33668295, it seems xor reg, reg is the best way to zero a register. But when I examine real-world assembly code (such as Windows bootloader code, IIRC), I see both xor reg, reg and sub reg, reg used.

Why is sub used at all for this purpose? Are there any reasons to prefer sub in some special cases? For example, does it set flags differently from xor?

Original Q&A

There are 2 answers

fuz On 20 January 2021 at 22:17

Both xor reg, reg and sub reg, reg are recognised as zeroing idioms on many modern x86 processors. The effect is the same for both and there is no advantage in using one over the other.

**Peter Cordes** · Accepted Answer · 2021-01-20T22:39:41+00:00

Differences:

sub reg,reg is documented to set AF=0 (the BCD half-carry flag, from bit 3 to bit 4). XOR leaves AF undefined. The architectural effect is otherwise exactly identical, leaving only possible performance differences. AF almost never matters, usually only if the next instruction is aaa or something.
sub-zeroing is slower than xor-zeroing on a few CPUs (e.g. Silvermont, as pointed out in my answer you linked), but the same performance on most. And of course both have the same 2-byte size.

I'd guess it's just different authors of hand-written asm, some of them preferring sub probably without realizing that some CPUs only special-case xor. Except in cases where they want to guarantee clearing the AF flag, where sub might be intentional. Like perhaps initializing things and wanting a fully known state for EFLAGS before something that might use pushf.

XOR leaving AF undefined still means it will be either 0 or 1, you just don't know which. (Not like C undefined behaviour). The actual result could depend on the CPU model, the input values, or possibly even some stray bits somewhere.

In modern CPUs that recognize sub as a zeroing idiom, it will be zero so the CPU can handle xor-zeroing and sub-zeroing exactly identically, including the FLAGS result.

TechQA.

Why does some Windows booloader code zero registers with `sub` instead of `xor`?

There are 2 answers

Related Questions in ASSEMBLY

Related Questions in X86

Related Questions in MICRO-OPTIMIZATION

Related Questions in ZERO-INITIALIZATION

Popular Questions

Popular Tags

Trending Questions