Reading the RISC-V unprivileged specification I see that U-format instructions (lui
,..) are defined like so:
But the immediate value doesn't make sense to me here: specifically, if given an instruction like lui t0, 0xABCDE
, the lower 12 bits of the immediate would (and should) be in the upper 20 bits of t0
. (i.e t0 = 0xABCDE000
)
The imm[31:12]
makes it look like the 0xABCDE
is being shifted left by 12 bits during assembly, is that correct?
The 20 bits are encoded into the imm field of the instruction, which is a field in the high 20 bits of the 32-bit instruction. The result is a machine code instruction that asks the processor to load the register with 0xABCDE000. Does this encoding involve shifting the immediate stated in assembly code left by 12 bits? Yes, I would agree with that.
To be clear it is a matter of syntax as to the form the assembler takes, and encoding as to the form the machine code takes. Here, we could write a different assembler that requires you to specify the value 0xABCDE000 (and then complains if the lower 12 bits are non-zero) — such that there is no assemble-time "shifting".
In RISC V, other instruction's immediates are encoded in a much more complex manner. One large immediate is split and inserted into separate individual fields; this is done in order to keep the register fields in the same place in every instruction, as well as keeping the sign bit of the immediate in the same place, (sign extension requires dynamic expansion).
Look at the S-Format, for example, where the 12 bit immediate is split into two separated fields in the encoded instruction.
See also