To give a little bit of background, I wanted to study how x86 instructions are encoded/decoded manually. I came across the ModR/M
and SIB
bytes and it seems that understanding the x86 addressing modes is fundamental to understanding the instruction encoding scheme.
Hence, I did a Google search for x86 addressing modes. Most blogs/videos that the search returned were addressing modes for the 8086 processor. Going through some of them, the different addressing modes were Register, Direct, Indirect, Indexed, Based, and some more. But the blogs use inconsistent names when referring to these addressing modes. Multiple different sources use multiple different addressing modes. The different terms are not even mentioned in the Intel manual here. For example, I can't seem to find anywhere in the Intel manual, an addressing mode called Direct or Indirect. Also, the Mod
bits in the ModRM
byte is a 2 bit field, which makes me wonder if more than 4 addressing modes are possible.
My question is, are terms like Direct addressing modes, Indirect addressing modes older terms that are no longer used in the Intel manuals, but used by the general public. If the terms technically do exists, where can I find a reference to them in the manuals.
There aren't really official names for most forms of x86 addressing modes. They all have the form
[base + index*scale + disp8/disp32]
(or a subset of any 1 or 2 components of that), except for 64-bit RIP-relative addressing. See Referencing the contents of a memory location. (x86 addressing modes) for a breakdown of what you can do with each subset.Intel does officially use those names for components of addressing modes, in section 3.7.5 of volume 1 (quoted below). They also use Register vs. Immediate vs. Memory to classify operands, but usually don't make a big deal about different forms of addressing mode for memory operands. (In x86 machine code, it's common for an operand to be r/m, i.e. it can be reg or mem depending on the 2-bit "mod" field in the ModRM byte, while the other operand is definitely a register or definitely an immediate, as implied by the opcode. e.g. see forms of
add
)Mod
chooses Register vs. Memory with disp0/8/32. There are "escape" codes for more modes[rbp]
with no displacement instead means there's a disp32 with no base. (This is why you see[rbp+0]
in disassembly: the best encoding for[rbp]
is base=rbp, with a disp8 of 0. (Note that[rbp]
isn't useful when it's a frame pointer.)[rsp]
, instead of the less-useful[rsp+rsp]
.)When writing in English about assembly language, it's natural to use terms with obvious meanings, including some that you mentioned. For example, Intel's optimization manual says (my emphasis):
Indexed addressing modes include any combination that uses
idx*scale
, regardless of whether it's with a base reg or with a disp32, or both. (idx
alone is not encodeable;[rax*1]
is actually encoded asdisp32+idx*1
withdisp32=0
.) At some point they say "any addressing mode with an index" or similar, otherwise it might not be clear exactly what they meant. Of course, testing with performance counters can verify the interpretation.But they don't over-do it with making up names for things. When there isn't an obvious English phrase they can stick on something, they write (still in the Sandybridge section):
In table 2-19, they have two columns, one for
Base + Offset > 2048;
orBase + Index [+ Offset]
, and another forBase + Offset < 2048
with latencies 1 cycle lower (except for 256b AVX loads). (Fun fact,[rdi+8]
is 1c lower latency than[rdi-8]
.)(Technically they probably should have said "displacement", because the whole addressing mode calculation (the effective-address) is the offset of the seg:off logical address in x86 terminology, which forms a linear address when added to the segment base. But "offset" is also used to describe immediate constant parts of addressing modes in non-x86 generic terminology. And x86 segmentation is fortunately not something you usually have to think about these days.)
In the vol.1 manual, Intel does sort of use some of the terminology you describe. They describe an addressing mode with just a displacement component as "direct" (sort of), and
[reg]
as "indirect", because those terms do get used when talking about instruction-sets and what kind of addressing modes they support.But as you saw, they don't make up names for the more complex forms.
They do distinguish between Immediate vs. Register vs. Memory operands, though. (3.7 OPERAND ADDRESSING). They usually make little or no distinction between an r/m32 operand that uses a register encoding, vs. the other operand that has to be a register, though.
Branch instruction terminology
Direct vs. indirect also comes up for branches. It's a bit like talking about the addressing mode for reaching the code bytes that will be run next.
Memory indirect is
jmp [rax]
, where the final value of RIP comes from memory, vs. a register-indirect branch likejmp rax
that sets RIP=RAX. x86 doesn't have a memory-indirect addressing mode for data loads/stores; code-fetch after a branch is taken introduces the extra level of indirection in the terminology. (Sort of, due to RIP being dereferenced by code-fetch after a new address is loaded into it).The vol.2 manual entry for
jmp
does talk about indirect vs. relative or absolute jumps. (Although note that x86 doesn't have absolute direct near jumps (if you can't use relative, put an address in a register and jmp reg); the only absolute direct jumps are slow "far"jmp ptr16:16
orjmp ptr16:32
with an immediate pointer as part of the machine code.)When describing near indirect jumps,
jmp r/m32
(or 64), they say "absolute offset specified indirectly in a GP reg or memory". ("offset" in the seg:off sense here, will be used as part of cs:eip or cs:rip for code-fetch.).Segmentation makes x86 addressing more complicated to talk about, especially when comparing special addressing modes that can include a segment explicitly vs. ones that don't.
Naming addressing modes is over-rated
It's far easier to remember what x86 addressing modes can do in terms of subsets of the general case, rather than memorizing all the different possibilities separately with names like Indexed, Based, or whatever.
You see that kind of thing in tutorials like https://www.tutorialspoint.com/microprocessor/microprocessor_8086_addressing_modes.htm or http://www.geeksforgeeks.org/addressing-modes/ that make a big deal out of classifying the addressing modes. The latter even has a quiz asking you to match C statements with some addressing-mode names.
With the less-flexible 16-bit addressing modes, there are few enough that you can try to name them, and Based vs. Indexed does actually give you a different choice of registers. But when you're programming, all you really need to remember is that it's your choice of any subset of
[bx|bp] + [di|si] + disp0/8/16
. This is howdi
/si
(dst/src index) and maybebx/bp
got their names.Terminology like this can be useful in comparing the capabilities of different ISAs. For example, Wikipedia says that old ISAs like PDP-8 made a lot of use of memory-indirect because they had few registers and only 8 bit addressing range with registers.
Wikipedia also says:
There's no sense making a big deal out of naming of modes. If you're writing something, make sure it's clear what you mean without depending on a specific technical meaning for certain terms. e.g. if you say "an index addressing mode", make sure the reader knows from context whether you're including
base+index*scale
or not.I wonder if some of the desire to name modes originated with 8-bit micros that predate 8086. You might want to ask about this over on https://retrocomputing.stackexchange.com/. I don't know much about addressing modes available on 8-bit CPUs with mostly fixed one-byte instructions.