Determining when NASM can infer the size of the mov operation

1.7k views Asked by At

Something has got me confused in x86 assembly for a while, it's how/when can NASM infer the size of the operation, here's an example:

mov ebx, [eax]

Here we are moving the 4 bytes stored at the address held in eax into ebx. The size of the operation is inferred as 4 bytes because the register is 32 bits.

However, this operation doesn't get inferred and throws a compile error:

mov [eax], 123456

Of course the solution is this:

mov dword [eax], 123456

Which will move the 32 representation of the number 123456 into the bytes stored at the address held at eax.

But this confuses me, surely it can see eax is 32 bit, so shouldn't it assume I want to store it as a 32 bit value without me having to specify dword after the mov?

Surely if I wanted to put the 16 bit representation of 12345 (smaller number to fit in 16 bits) into eax I would do this:

mov ax, 12345
3

There are 3 answers

0
zwol On BEST ANSWER

The operand-size would be ambiguous (and so must be specified) for any instruction with a memory destination and an immediate source. (Neither operand actually being a register, even if using one or more in an addressing mode.)

Address-size and operand-size are separate attributes of an instruction.


Quoting what you said in a comment on another answer, since I think this gets at the core of your confusion:

I would expect mov [eax], 1 to set the 4 bytes held in memory address eax to the 32 bit representation of 1

The BYTE/WORD/DWORD [PTR] annotation is not about the size of the memory address; it's about the size of the variable in memory at that address. Assuming flat 32-bit addressing, addresses are always four bytes long, and therefore must go in Exx registers. So, when the source operand is an immediate value, the dword (or whatever) annotation on the destination operand is the only way the assembler can know whether it's supposed to modify 1, 2, or 4 bytes of RAM.

Perhaps it will help if I demonstrate the effect of these annotations on machine code:

$ objdump -d -Mintel test.o
...
   0:      c6 00  01             mov    BYTE PTR  [eax], 0x1
   3:   66 c7 00  01 00          mov    WORD PTR  [eax], 0x1
   8:      c7 00  01 00 00 00    mov    DWORD PTR [eax], 0x1

(I've adjusted the spacing a bit compared to how objdump actually prints it.)

Take note of two things: (1) the three different operand prefixes produce three different machine instructions, and (2) using a different prefix changes the length of the source operand as emitted into the machine code.

0
uname01 On

mov [eax], 123456

This instruction would use immediate addressing for the source operand and indirect addressing for the destination operand i.e. place the decimal 123456 into the memory address stored in register eax, as you pointed out but the memory address to which eax points does not itself have to be 32 bits in size. NASM can not infer the size of the destination operand. The size of the pointer in register eax is 32 bits.

Address-size and operand-size are totally separate attributes of an instruction.

Surely if I wanted to put the 16 bit representation of 12345 into eax I would do this: mov ax, 12345

Yes but here you are using immediate addressing for the source operand and register addressing for the destination operand. The assembler can infer the amount of data you wish to move from the size of the destination register (16 bits in the case of the AX register, leaving the upper 2 bytes of the full EAX unmodified so you're not actually setting 32-bit EAX to that value).

compile error

I think you meant assembly error :)

3
Sami Kuhmonen On

In your first case it can determine it without problems, since EBX is a 32bit register. But in the second one you're using EAX as an address, not as a destination register so nasm developers took the safe route and make the developer choose the size.

If you did mov [eax], 1, what could nasm determine from that? Do you want to set the byte, 16bit or 32bit block of memory to 1? It is totally unknown. This is why it's better to force the developer to state the size.

It would be entirely different if you said mov eax, 123456 since then the destination is a register.