I'm learning data movement(MOV
) in assembly.
I tried to compile some code to see the assembly in a x86_64 Ubuntu 18.04 machine:
typedef unsigned char src_t;
typedef xxx dst_t;
dst_t cast(src_t *sp, dst_t *dp) {
*dp = (dst_t)*sp;
return *dp;
}
where src_t
is unsigned char
. As for the dst_t
, I tried char
, short
, int
and long
.
The result is shown below:
// typedef unsigned char src_t;
// typedef char dst_t;
// movzbl (%rdi), %eax
// movb %al, (%rsi)
// typedef unsigned char src_t;
// typedef short dst_t;
// movzbl (%rdi), %eax
// movw %ax, (%rsi)
// typedef unsigned char src_t;
// typedef int dst_t;
// movzbl (%rdi), %eax
// movl %eax, (%rsi)
// typedef unsigned char src_t;
// typedef long dst_t;
// movzbl (%rdi), %eax
// movq %rax, (%rsi)
I wonder why movzbl
is used in every case? Shouldn't it correspond to dst_t
?
Thanks!
If you're wondering why not
movzbw (%rdi), %ax
forshort
, that's because writing to 8-bit and 16-bit partial registers has to merge with the previous high bytes.Writing a 32-bit register like EAX implicitly zero-extends into the full RAX, avoiding a false dependency on the old value of RAX or any ALU merging uop. (Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?)
The "normal" way to load a byte on x86 is with
movzbl
ormovsbl
, same as on a RISC machine like ARMldrb
orldrsb
, or MIPSlbu
/lb
.The weird-CISC thing that GCC usually avoids is a merge with the old value that replaces only the low bits, like
movb (%rdi), %al
. Why doesn't GCC use partial registers? Clang is more reckless and will more often write partial regs, not just read them for stores. You might well see clang load into just%al
and store whendst_t
issigned char
.If you're wondering why not
movsbl (%rdi), %eax
(sign-extension)The source value is unsigned, therefore zero-extension (not sign-extension) is the correct way to widen it according to C semantics. To get
movsbl
, you'd needreturn (int)(signed char)c
.In
*dp = (dst_t)*sp;
the cast todst_t
is already implicit from the assignment to*dp
.The value-range for
unsigned char
is 0..255 (on x86 where CHAR_BIT = 8).Zero-extending this to
signed int
can produce a value range from0..255
, i.e. preserving every value as signed non-negative integers.Sign-extending this to
signed int
would produce a value range from-128..+127
, changing the value ofunsigned char
values >= 128. That conflicts with C semantics for widening conversions preserving values.It has to widen at least as wide as
dst_t
. It turns out that widening to 64-bit by usingmovzbl
(with the top 32 bits handled by implicit zero-extension writing a 32-bit reg) is the most efficient way to widen at all.Storing to
*dp
is a nice demo that the asm is for adst_t
with a width other than 32-bit.Anyway, note that there's only one conversion happening. Your
src_t
gets converted todst_t
in al/ax/eax/rax with a load instruction, and stored to dst_t of whatever width. And also left there as the return value.A zero-extending load is normal even if you're just going to read the low byte of that result.