I'm currently trying to figure out how to add the first byte in memory pointed to by the pointer register SI
to the current contents of the AX
register.
So if SI
holds some address, and the values in memory at that address are: 00 and 01, I'm looking to add just 00 to the AX
register.
The first instruction my assembly-noobish self tried was add ax, byte ptr [SI]
but of course, no dice, as I'm trying to add operands of different sizes.
My current workaround is
mov dx,0000h ;empty the contents of dx
mov dl,byte ptr [si] ;get the value of the first byte in a register
add ax,dx ;perform the originally desired addition
But this is incredibly wasteful and really hurts my executed instructions count (this is part of a subroutine that runs many times).
I'm limited to the 8086 instruction set so this question/answer by Peter Cordes which suggests movzx
to condense my first two lines is unfortunately not viable.
As you say, if you can assume a 386-compatible CPU, a good option (especially for modern CPUs) is
movzx dx, byte ptr [mem]
/add ax, dx
. If not, I guess we can pretend we're tuning for a real 8086, where code size in bytes is often more important than instruction count. (Especially on 8088, with its 8-bit bus.) So you definitely want to usexor dx, dx
to zero DX (2 bytes instead of 3 formov reg, imm16
), if you can't avoid a zeroing instruction altogether.Hoist the zeroing of DX (or DH) out of any loop, so you just
mov dl, [mem]
/add ax, dx
. If the function only does it once, you may need to (manually) inline the function in call sites that call it in a loop, if it's small enough for that to make sense. Or pick a register where callers are responsible for having the upper half zero.As Raymond says, you can pick any other register whose high half you know to be zero at that point in your function. Perhaps you could
mov cx, 4
instead ofmov cl, 4
if you happened to need CL=4 for something else earlier, but you're done with CX by the time you need to add into AX.mov cx, 4
is only 1 byte longer, so you get CH zeroed with only 1 extra byte of code-size. (vs.xor cx, cx
costs 2 bytes)Another option is byte add/adc, but that isn't ideal for code size. (Or performance on later CPUs.)
So that's 1 byte more than if you already had a spare upper-zeroed register:
But on the plus side, add/adc doesn't need any extra register at all.
With the pointer in SI, it's worth looking for ways to take advantage of
lodsb
if you're really optimizing for code-size. That doesmov al, [si]
/inc si
(or insteaddec si
if DF=1), but without affecting FLAGS. So you'd want to add into a different register.xchg ax, reg
is only 1 byte, but if you need two swaps it may not pay for itself if you actually have to return in AX, not some other register.