I am not sure how to word this question but, I am curious to find out how assemblers and other tools show the opcodes of certain bytes?
std::string BytesToOpcode( __in ::BYTE Bytes );
int main( void )
{
std::cout << BytesToOpcode( ( ::PBYTE )"\x33\xC0" );
std::cin.get( );
return( EXIT_SUCCESS );
};
// I don't know what type soo I'll just set as std::string for an example.
std::string BytesToOpcode( __in ::BYTE Bytes )
{
// Convert Bytes to opcode??
return( "" );
};
Output should be:
XOR EAX,EAX
Generally, a disassembler will have a combination of tables and a "decode type" (which usually is a function pointer or something that goes into a switch statement) - the decode type tells which class the instruction is - for example,
xor, or, and, add, sub
would have the same decoding, butcall, jmp
would be a different decoding.jnz, jz, jnc, jc, ja, jb, jbe, etc
would have yet another decode type.So the first level table will be 256 entry table. You then have certain entries that are "prefix", such as
0xff
, where the next byte tells what the instruction "really is". Again, you get a table of 256prefix0xff
entry table.Some entries may not be valid, as not ALL combinations are taken so far [although nearly all].
A tricky one is the "modifier prefix" entries. For examble, 0x66 will switch an instruction from 32 to 16 bit operand size (or vice versa if the processor is in 16-bit mode).
A lot of the actual decoding inside each category will involve twiddling bits and translating "bits 5-3" to register number or "bits 1-2" to address mode (is it
eax
,[eax]
or[eax+esi]
, for example).It's quite a lot of work. I wrote a disassembler for 80186, and it took me about two days of pretty much all day work. However, I knew already what I was doing. To convert that to 386 took another 2-3 days, and I wouldn't want to think about doing it for a modern x86 processor with all the SSE, MMX, 3DNow! etc instructions.
[And I've taken far too long explaining how to do this to get a "correct answer" - even though this IS the correct answer of how you do this - of course, using an already existing library is clearly the simpler way to do it].