I'm creating an x86 decoder and I'm struggling on understanding and finding an efficient way to calculate the mnemonic of an instruction.
I know that the opcode 6 MSBs are the opcode bits, but I can't find anywhere that use those 6 bits in a mnemonic table. The only mnemonic table I find is for the whole opcode byte itself and not just the 6 MSBs.
I wanted to ask what are some efficient ways I can go on decoding the mnemonics encoded in the opcode byte, and if there're any table references using the 6 MSBs and not the whole opcode byte.
This has become an algorithms and data structures question.
As you point out, many of the opcode table entries (at least for the table without a
0f
escape byte: http://sparksandflames.com/files/x86InstructionChart.html) do repeat in groups of 4 or 2, i.e. with the same 6 or 7-bit prefix selecting the same mnemonic.Obviously a 256-entry table of structs is simple, but duplicates things. It's very fast and easy to use, since it's probably still small enough not to cache-miss very often. (Especially since the common entries will stay hot in cache; x86 code uses the same opcodes a lot.)
You can trade simplicity / performance for space.
You could have a 64-entry table of structs where one member is a pointer to a secondary table to be indexed with the low 2 bits. If the pointer is NULL, it means the instruction follows the pattern of
add
/and
/xor
/ etc. where the low 2 bits tell you 8 bit vs. whatever the operand-size is and direction (r/m,reg or reg,r/m).Your struct would also need entries for turning into other instructions when certain prefixes are present (e.g.
rep nop
ispause
). Also, AVX VEX prefixes use what used to be an invalid encoding of another instruction. x86 is pretty crazy to decode if you want to do a complete job for all the current extensions.Actually, it might be simplest (and also efficient) to just use a table of function pointers. Or a struct with a
const char* mnemonic
and aint (*decode)(const char*mnemonic, const char *insn_bytes, unsigned prefix_bitmap)
function, so lots of opcodes can point to the same decode-function but still get different mnemonics. Sometimes the function will ignore the passed mnemonic, but other times that's all it needs. You'd have a common function for decoding addressing modes that many of the decode functions would call.This is fairly similar to how you might implement an x86 emulator that interprets, instead of doing dynamic recompilation. A common decode loop and then dispatching through function pointers.
An even more complicated data structure you might use is a radix trie aka prefix tree. See also https://en.wikipedia.org/wiki/Trie#Bitwise_tries.
This is getting into silly season, because the density is so high that a lookup table makes much more sense. (There are very few undefined opcode).