Why movzbl is used in assembly when casting unsigned char to signed data types?

1.5k views Asked by At

I'm learning data movement(MOV) in assembly.
I tried to compile some code to see the assembly in a x86_64 Ubuntu 18.04 machine:

typedef unsigned char src_t;
typedef xxx dst_t;

dst_t cast(src_t *sp, dst_t *dp) {
    *dp = (dst_t)*sp;
    return *dp;
}

where src_t is unsigned char. As for the dst_t, I tried char, short, int and long. The result is shown below:

// typedef unsigned char src_t;
// typedef char dst_t;
//  movzbl  (%rdi), %eax
//  movb    %al, (%rsi)

// typedef unsigned char src_t;
// typedef short dst_t;
//  movzbl  (%rdi), %eax
//  movw    %ax, (%rsi)

// typedef unsigned char src_t;
// typedef int dst_t;
//  movzbl  (%rdi), %eax
//  movl    %eax, (%rsi)

// typedef unsigned char src_t;
// typedef long dst_t;
//  movzbl  (%rdi), %eax
//  movq    %rax, (%rsi)

I wonder why movzbl is used in every case? Shouldn't it correspond to dst_t? Thanks!

1

There are 1 answers

1
Peter Cordes On BEST ANSWER

If you're wondering why not movzbw (%rdi), %ax for short, that's because writing to 8-bit and 16-bit partial registers has to merge with the previous high bytes.

Writing a 32-bit register like EAX implicitly zero-extends into the full RAX, avoiding a false dependency on the old value of RAX or any ALU merging uop. (Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?)

The "normal" way to load a byte on x86 is with movzbl or movsbl, same as on a RISC machine like ARM ldrb or ldrsb, or MIPS lbu / lb.

The weird-CISC thing that GCC usually avoids is a merge with the old value that replaces only the low bits, like movb (%rdi), %al. Why doesn't GCC use partial registers? Clang is more reckless and will more often write partial regs, not just read them for stores. You might well see clang load into just %al and store when dst_t is signed char.


If you're wondering why not movsbl (%rdi), %eax (sign-extension)

The source value is unsigned, therefore zero-extension (not sign-extension) is the correct way to widen it according to C semantics. To get movsbl, you'd need return (int)(signed char)c.

In *dp = (dst_t)*sp; the cast to dst_t is already implicit from the assignment to *dp.


The value-range for unsigned char is 0..255 (on x86 where CHAR_BIT = 8).

Zero-extending this to signed int can produce a value range from 0..255, i.e. preserving every value as signed non-negative integers.

Sign-extending this to signed int would produce a value range from -128..+127, changing the value of unsigned char values >= 128. That conflicts with C semantics for widening conversions preserving values.


Shouldn't it correspond to dst_t?

It has to widen at least as wide as dst_t. It turns out that widening to 64-bit by using movzbl (with the top 32 bits handled by implicit zero-extension writing a 32-bit reg) is the most efficient way to widen at all.

Storing to *dp is a nice demo that the asm is for a dst_t with a width other than 32-bit.

Anyway, note that there's only one conversion happening. Your src_t gets converted to dst_t in al/ax/eax/rax with a load instruction, and stored to dst_t of whatever width. And also left there as the return value.

A zero-extending load is normal even if you're just going to read the low byte of that result.