How to look up what form of an instruction is used, by opcode or disassembly?

Question

How to look up what form of an instruction is used, by opcode or disassembly?

1.1k views Asked by Eric Stotch At 13 December 2020 at 23:44

Sites like https://uops.info/ and Agner Fog's instruction tables, and even Intel's own manuals, list various forms of the same instruction. For example add m, r (in Agner's tables) or add (m64, r64) on uops.info, or ADD r/m64, r64 in Intel's manual (https://www.felixcloutier.com/x86/add).

Here's a simple example I ran on godbolt

__thread int a;
void Test() {
    a+=5;
}

The add is add DWORD PTR fs:0xfffffffffffffffc,0x5. It starts with the opcodes 64 83 04 25.

There's a few ways to write my real code but I wanted to lookup how many cycles this might take and other information. How the heck do I find the reference to this instruction? I tried https://uops.info/table.html typing in "add" and checking off my architecture. But I have no idea which one of the entries is the instruction that's being used.

For now in this specific case I'm guessing the opcode is Add m64, r64 but I have no idea if there's any penalty for using fs: before the address or if there's a way to see opcodes so I can confirm I'm looking at the right reference

Original Q&A

There are 2 answers

SSpoke On 14 December 2020 at 00:21

Look at the Intel manual for x86 CPU Its about 6000 pages long i'm sure its there lol https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf

Also check this out this site http://ref.x86asm.net/coder64.html size just search for 64 (it shows as greyed out the opcode), as you can see 64 has nothing to do with ADD opcode its just a FS:[] segment override prefix, and 83 is the ADD Opcode

Here is how your opcode works like I simulated it in IDA disassembler.

looks like this in ASM

**Peter Cordes** · Accepted Answer · 2020-12-14T00:05:28+00:00

http://ref.x86asm.net/coder64.html has an opcode map, but with a bit of experience you won't need one most of the time. Especially when you have disassembly, you can just check the manual entry for that mnemonic (https://www.felixcloutier.com/x86/add), and see which of the possible opcodes it is (83 /0 add r/m32, imm8).

Clearly this has a 32-bit operand-size (dword ptr) memory destination, and the source is an immediate (numeric constant). That rules out a , r64 register source for 2 separate reasons. So even without looking at the machine code, it's definitely add r/m32, imm with an imm8 or imm32. Any sane assembler will of course pick imm8 for a small constant that fits in a signed 8-bit integer.

Generally different ways of encoding the same instruction aren't special, so the source-level assembly / disassembly is fine, as long as you understand what's a register, what's memory, and what's an immediate.

But there are a few special cases, e.g. Agner Fog's guide notes that rotates by 1 using the short-form encoding are slower than rol reg, imm8 even when the imm8=1, because the flag-updating special case for rotate-by-1 actually depends on the opcode, not the immediate count. (Intel's documentation apparently assumes your assembler will always pick the short-form for rotate by constant 1. The part about "masked count" may only apply to rotate by cl. https://www.felixcloutier.com/x86/rcl:rcr:rol:ror#flags-affected. I haven't tested this recently and am not 100% sure I'm remembering correctly when OF is updated (but other flags in the SPAZO group are always left unmodified), but IIRC that's why rotates by 1 (2 uops) and by cl (3 uops) are slow, vs. rotates by other immediate counts (1 uop) on Intel).

Or https://github.com/travisdowns/uarch-bench/wiki/Intel-Performance-Quirks. Specifically I mean Which Intel microarchitecture introduced the ADC reg,0 single-uop special case? - even on Haswell / Skylake, adc al,0 (using the short form with no modrm byte) is 2 uops, and so is the equivalent adc eax, 12345. But adc edx, 12345 is 1 uop using the non-special case.) Then you have to either check the machine code, or know how your assembler will have chosen to encode a given instruction. (Optimizing for size).

BTW, using a segment with a non-zero base adds 1 cycle of latency to address-generation, IIRC, but aren't a significant throughput penalty. (Unless of course throughput bottlenecks on a latency chain that it's part of...)

TechQA.

How to look up what form of an instruction is used, by opcode or disassembly?

There are 2 answers

Related Questions in ASSEMBLY

Related Questions in X86-64

Related Questions in DISASSEMBLY

Related Questions in MACHINE-CODE

Related Questions in MICRO-ARCHITECTURE

Popular Questions

Trending Questions