For x86 and x64 compilers generate similar zero/sign extend MOVSX and MOVZX. The expansion itself is not free, but allows processors to perform out-of-order magic speed up.
But on RISC-V:
Consequently, conversion between unsigned and signed 32-bit integers is a no-op, as is conversion from a signed 32-bit integer to a signed 64-bit integer.
A few new instructions (ADD[I]W/SUBW/SxxW) are required for addition and shifts to ensure reasonable performance for 32-bit values.
(C) RISC-V Spec
But at the same time, the new modern RISC-V 64-bit processors contains instructions for 32-bit signed integers. Why? To increase performance? Where then are 8 and 16-bits? I already do not understand anything.
This is one of those cases where the ABI starts to bleed in to the ISA. You'll find a handful of these floating around in RISC-V. As a result of us having a pretty significant software stack ported by the time we standardized the ISA we got to fine tune the ISA to match real code. Since an explicit goal of the base RISC-V ISAs was to keep a lot of encoding space available for future expansion.
In this case, the ABI design decision is to answer the question "Is there a canonical representation of types that, when stored in registers, do not need every bit pattern provided by those registers in order to represent every value representable by the type?" In the case of RISC-V we chose to mandate a canonical representation for all types. There's a feedback loop here with some ISA design decisions and I think the best way to go about this is to work through an example of what ISA would have co-evolved with an ABI where we didn't mandate a canonical representation.
As a thought exercise, let's assume that the RISC-V ABI did not mandate a canonical representation for the high bits of
int
when stored in an X register on RV64I. The result here is that the existing W family of instructions wouldn't be particularly useful: you can useaddiw t0, t0, 0
as a sign extension so the compiler can the rely on what's in the high-order bits, but that adds an additional instruction to many common patterns like compare+branch. The correct ISA design decision to make here would be to have a different set of W instructions, something like "compare on the low 32 bits and branch". If you run the numbers, you end up with about the same number of additional instructions (branch and set as opposed to add, sub, and shift). The issue is that the branch instructions are much more expensive in terms of encoding space because they have much longer offsets. Since encoding space is considered an important resource in RISC-V, when there is no clear performance advantage we tend to chose the design decision that conserves more encoding space. In this case there's no meaningful performance distinction as long as the ABI matches the ISA.There's a second order design decision to be made here: is the canonical representation to sign extend or to zero extend? There's a trade off here: sign extension results in faster software (for the same amount of encoding space used), but more complicated hardware. Specifically, the common C fragment
compiles very efficiently when sign extension is used
but is slower when zero-extension is used (again, assuming we're unwilling to blow a lot of encoding space on word-sized compare+branch instructions)
When we balanced things out, it appeared that the software cost of zero extension was higher than the hardware cost of sign extension: for the smallest possible design (ie, a microcoded implementation) you still need an arithmetic right shift so you don't lose any datapath, and for the biggest possible design (ie, a wide out of order core) the code would just end up shuffling bits around before branching. Oddly enough, the one place you pay a meaningful cost for sign extension is in in-order machines with short pipelines: you could shave a MUX delay off the ALU path, which is critical in some designs. In practice there are a lot of other places where sign extension is the right decision to make, so just changing this one wouldn't result in the removal of that datapath.