I'm dissassembling and inspecting (mostly for fun and learning) the Arduino code generated for an ESP8266 (Xtensa ISA).
I've been following the code so far without issues until the curly brackets (location 4010f4c2
) in the main function:
4010f494 <main>:
4010f494: 90a092 movi a9, 144
4010f497: c01190 sub a1, a1, a9
4010f49a: 00a022 movi a2, 0
4010f49d: 236102 s32i a0, a1, 140
4010f4a0: 2261c2 s32i a12, a1, 136
4010f4a3: 2161d2 s32i a13, a1, 132
4010f4a6: ffc2c5 call0 4010f0d4 <print_version>
4010f4a9: 202110 or a2, a1, a1
4010f4ac: 001045 call0 4010f5b4 <eboot_command_read>
4010f4af: 00d256 bnez a2, 4010f4c0 <main+0x2c>
4010f4b2: 024c movi.n a2, 64
4010f4b4: fee101 l32r a0, 4010f038 <_stext+0x38>
4010f4b7: 0000c0 callx0 a0
4010f4ba: 1d0c movi.n a13, 1
4010f4bc: 000506 j 4010f4d4 <main+0x40>
4010f4bf: af2200 excw
4010f4c2: 2200a0d2016122ff { l32r a15, 400e794c <__udivsi3+0xd9730>; excw }
4010f4ca: d97ea0 excw
4010f4cd: da0121 l32r a2, 40105cd4 <__udivsi3+0xf7ab8>
4010f4d0: 9c0c11280000c0fe { excw; excw; srli a0, a12, 12 }
4010f4d8: 5a1266 bnei a2, 1, 4010f536 <main+0xa2>
4010f4db: feda21 l32r a2, 4010f044 <_stext+0x44>
4010f4de: fecc01 l32r a0, 4010f010 <_stext+0x10>
4010f4e1: 0000c0 callx0 a0
4010f4e4: fedd01 l32r a0, 4010f058 <_stext+0x58>
4010f4e7: 0000c0 callx0 a0
4010f4ea: 3138 l32i.n a3, a1, 12
4010f4ec: 4148 l32i.n a4, a1, 16
4010f4ee: 2128 l32i.n a2, a1, 8
4010f4f0: 050c movi.n a5, 0
4010f4f2: ffcec5 call0 4010f1e0 <copy_raw>
4010f4f5: 02cd mov.n a12, a2
4010f4f7: fed901 l32r a0, 4010f05c <_stext+0x5c>
4010f4fa: 0000c0 callx0 a0
4010f4fd: fed221 l32r a2, 4010f048 <_stext+0x48>
4010f500: 0c3d mov.n a3, a12
4010f502: fec301 l32r a0, 4010f010 <_stext+0x10>
4010f505: 0000c0 callx0 a0
4010f508: acec bnez.n a12, 4010f536 <main+0xa2>
4010f50a: f27c movi.n a2, -1
4010f50c: 1129 s32i.n a2, a1, 4
4010f50e: 3128 l32i.n a2, a1, 12
4010f510: 2129 s32i.n a2, a1, 8
4010f512: 2dec bnez.n a13, 4010f538 <main+0xa4>
4010f514: fece21 l32r a2, 4010f04c <_stext+0x4c>
4010f517: febe01 l32r a0, 4010f010 <_stext+0x10>
4010f51a: 0000c0 callx0 a0
4010f51d: 2128 l32i.n a2, a1, 8
4010f51f: ffbf05 call0 4010f110 <load_app_from_flash_raw>
4010f522: 02cd mov.n a12, a2
4010f524: 203220 or a3, a2, a2
4010f527: feca21 l32r a2, 4010f050 <_stext+0x50>
4010f52a: feb901 l32r a0, 4010f010 <_stext+0x10>
4010f52d: 0000c0 callx0 a0
4010f530: 0003c6 j 4010f543 <main+0xaf>
4010f533: 000000 ill
4010f536: 4d8c beqz.n a13, 4010f53e <main+0xaa>
4010f538: 201110 or a1, a1, a1
4010f53b: 000d05 call0 4010f60c <eboot_command_clear>
4010f53e: 1128 l32i.n a2, a1, 4
4010f540: d00226 beqi a2, -1, 4010f514 <main+0x80>
4010f543: 5c9c beqz.n a12, 4010f55c <main+0xc8>
4010f545: fec341 l32r a4, 4010f054 <_stext+0x54>
4010f548: f37c movi.n a3, -1
4010f54a: 0020c0 memw
4010f54d: 002422 l32i a2, a4, 0
4010f550: 013310 slli a3, a3, 31
4010f553: 202230 or a2, a2, a3
4010f556: 0020c0 memw
4010f559: 006422 s32i a2, a4, 0
4010f55c: ffff06 j 4010f55c <main+0xc8>
I saw this before but I wasn't to bothered about it until the code reached the location 4010f4af
with a branch instruction to 4010f4c0
which sit well in the middle of the curly brackets. Of course even with this, if I try to apply the parsing logic, over this byte location I get ffaf22
which corresponds to the valid instruction movi a2, 0xfff
.
This code belongs to the eboot.elf
file and I dissassemble it like this:
~/.arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/3.0.4-gcc10.3-1757bed/xtensa-lx106-elf/bin/objdump -d eboot.elf
Do you guys know why objdump is showing those curly brackets and why is it interpreting them like? Have I missunderstood part of the Xtensa manual? Am I maybe not running the right command?
Thank you very much!
xtensa assembler and disassembler use curly brackets for VLIW-style (usually called FLIX in xtensa world) instruction bundles: groups of opcodes decoded together as one instruction and executed by the processor in parallel. For example
{ l32r a15, 400e794c <__udivsi3+0xd9730>; excw }
could be a two-slot instruction withl32r
opcode in the first slot andexcw
opcode in the second. But if you see them in disassembly of code for xtensa cores that don't support FLIX (e.g. lx106 does not support FLIX) that usually means two things: 1) the disassembler is configured incorrectly and 2) it has likely lost the stream of instructions and is disassembling data or incorrectly composed instruction bytes.In the example above one can see that instruction
4010f4af: bnez a2, 4010f4c0 <main+0x2c>
jumps right into the middle of instruction4010f4bf: excw
. It means that there's a non-instruction byte at the address 0x4010f4bf, but the disassembler didn't realize that. Normally the disassembler uses the contents of the section.xt.prop
to distinguish instruction bytes and non-instruction bytes and that helps it maintain synchronization with instruction stream, but when that section is missing it loses synchronization like that.Regarding incorrect configuration: when binutils are built for a specific xtensa core one need to replace certain files in the binutils source with the contents of the xtensa configuration overlay generated for that core. It contains information about valid opcodes, instruction formats and their binary representation for that core and is used by the assembler and disassembler to only accept and produce valid instructions. The appearance of instruction formats that are not supported by the core in the disassembly is a clear sign of misconfiguration.
Excessive use of
excw
is yet another telltale sign of a bogus disassembly: because of the bug in the xtensa overlay generator (fixed somewhere between the releases RG-2017.5 and RG-2017.8 of xtensa tools) binutils disassembler reports theexcw
opcode instead of any unrecognized opcode when configured with an overlay produced by buggy tools.