Is it possible to eliminate these redundant trailing zeros?

430 views Asked by At

I am writing some very tightly confined ASM code.

Notice this group of opcodes generated by NASM:

8AA4241C020000    mov ah,[esp+0x21c]

And the similar:

051C020000 add eax,0x21c ; 4 extra 0's! 
8D84241C020000    lea eax,[esp+0x21c] ; Brutal! 

Is there any way to communicate to the processor that you intend to apply a 15bit offset to a 32 bit register, and let it figure out the 0 padding for itself?

I've been combing through https://c9x.me/x86/html/file_module_x86_id_176.html for some guidance. The extra 2 bytes here or there would really save my life!

Also accepted:

Alternative ways to rewrite the statement to make it smaller, ultimately what I'm going for in this instance is something like:

mov eax,[esp+0x21c]
push eax 

If there is a way to hand encode that to make it SUPER tiny, I'd love to see the technique.

2

There are 2 answers

0
Cody Gray - on strike On

Is there any way to communicate to the processor that you intend to apply a 15bit offset to a 32 bit register, and let it figure out the 0 padding for itself?

No. The available instruction encodings are documented in the Intel manuals (online versions of which are available various places online; see links in the tag wiki). For MOV, the offset sizes match the register sizes. The processor only uses 16-bit offsets when you are MOVing into a 16-bit register. There is no way to ever get a 15-bit offset.

As Raymond Chen says, "it's not like you can just make up [your own custom encoding]".

There is a sign extension bit for some instructions, in some modes.

Sure, but I don't see how this would help you. Your goal is to reduce the size of the instruction: adding an extra 16-bit operand size prefix in order to change the interpretation of the offset size isn't going to help you do that.

In general, if there were a shorter way to encode the instruction that was equivalent to the original, the assembler would be emitting that encoding for you. Certainly NASM would, with its multi-pass optimization option (enabled by default).

The extra 2 bytes here or there would really save my life!

This isn't one of the places where you can effectively save.

As David Wohlferd already suggested, if you're doing this repeatedly, you may be able to compress the code size slightly by pre-clearing a register (XOR reg, reg; 2 bytes), using this as the source register for reg-reg MOVs (which are only 2 bytes each), and then doing 16-bit MOVs into those registers that already have their upper 16 bits cleared.

When dealing with ISAs that have plenty of registers, it is relatively common practice to dedicate one to contain 0 in the context of a particular procedure. Many ISAs take this even further by having a dedicated zero register. You can do this with x86, too, but it's usually a pessimization, given how register-constrained the ISA is. But if you're optimizing for size above all else, it might sometimes make sense. (Then again, it might not, since it might force you to spill to memory, and that will bloat the code by at least 2 bytes for each store and load.)

In reality, I'm betting there are plenty of other places in your code where you're being spendthrift with instruction sizes and much more significant reductions can be achieved. If you want a review of the code with an eye towards reducing its size, consider posting a question over on Code Review (assuming that you have working code, of course).

I'm not really sure under what circumstances you would be writing code where a savings of 2 bytes would be significant. Maybe you're writing a boot loader that needs to fit within 512 bytes? In that case, what most people do is write a multiple stage bootloader, where the first stage, the one that is constrained to only 512 bytes, simply calls the second stage, where you have no such limitations.

0
Ped7g On

If you would have some register with zeroed upper 24 bits, then for example (for eax zeroed) it is possible to shave 2 bytes off:

; additional 2 bytes ruining the saving, if you don't have zero reg.
; b0 87                   mov    al,0x87

; 5 byte fetch of value
b0 87                   mov    al,0x87
8a 24 84                mov    ah,BYTE PTR [esp+eax*4] 

Or if you know you have some low value 104..540 (only some of them suitable) in some other register, you may lower the offset by it a bit, for example let's say you know the ebx == 104:

8a 64 9c 7c             mov    ah,BYTE PTR [esp+ebx*4+(0x21C-104*4)]

If this would be real size challenge, you have to post whole code, because there may be (and very often they are) crazy ways to save size in very unexpected and almost unimaginable ways.