How to decode the section table in an ELF?

3.1k views Asked by At

I'm analyzing this tiny ELF file:

00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  02 00 3e 00 01 00 00 00  78 00 40 00 00 00 00 00  |..>.....x.@.....|
00000020  40 00 00 00 00 00 00 00  98 00 00 00 00 00 00 00  |@...............|
00000030  00 00 00 00 40 00 38 00  01 00 40 00 03 00 02 00  |[email protected]...@.....|
00000040  01 00 00 00 05 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 40 00 00 00 00 00  00 00 40 00 00 00 00 00  |..@.......@.....|
00000060  7e 00 00 00 00 00 00 00  7e 00 00 00 00 00 00 00  |~.......~.......|
00000070  00 00 20 00 00 00 00 00  31 c0 ff c0 cd 80 00 2e  |.. .....1.......|
00000080  73 68 73 74 72 74 61 62  00 2e 74 65 78 74 00 00  |shstrtab..text..|
00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000d0  00 00 00 00 00 00 00 00  0b 00 00 00 01 00 00 00  |................|
000000e0  06 00 00 00 00 00 00 00  78 00 40 00 00 00 00 00  |........x.@.....|
000000f0  78 00 00 00 00 00 00 00  06 00 00 00 00 00 00 00  |x...............|
00000100  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00000110  00 00 00 00 00 00 00 00  01 00 00 00 03 00 00 00  |................|
00000120  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000130  7e 00 00 00 00 00 00 00  11 00 00 00 00 00 00 00  |~...............|
00000140  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00000150  00 00 00 00 00 00 00 00                           |........|
00000158

I found documentation on the ELF header and the program header and decoded both of those, but I'm having problems decoding what's after this (starting with 31 c0 ff c0 cd 80 00 2e). Judging by the "shstrtab" text, I am looking at the section table, but what does 31 c0 ff c0 cd 80 00 2e mean? Where is this part documented?

1

There are 1 answers

1
BarbaraKwarc On BEST ANSWER

OK, judging by the information in the first 16 bytes of the header:

00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
              E  L  F |  |            '--- Pudding :) ---'
                      |  '--- Little-endian (ELFDATA2LSB)
                      '------ 64-bit (ELFCLASS64)

we're dealing with a 64-bit ELF with little-endian encoding of multi-byte numbers. So the ELF header is the first 4 rows in the hex editor. We're interested in these fields in the last two rows of it:

           Prog Hdr Tab offset      Sect Hdr Tab offset
          .----------^----------.  .----------^----------.
00000020  40 00 00 00 00 00 00 00  98 00 00 00 00 00 00 00  |@...............|
00000030  00 00 00 00 40 00 38 00  01 00 40 00 03 00 02 00  |[email protected]...@.....|
                            '-.-'  '-.-' '-.-' '-.-' '-.-'
           PHT entry size  ---'      |     |     |     '-- Sect names in #2
           PHT num entries ----------'     |     '-- SHT num entries
                                           '-------- SHT entry size

So we know that the Program Headers Table starts at offset 0x40 in the file (right after this header) and contains 1 entry of size 0x38 (56 bytes). So it ends at offset 0x40 + 1*0x38 = 0x78 (this is the first byte after this table, and this is also where your "mysterious data" begins, so keep this in mind).

The Section Headers Table starts at offset 0x98 in the file and contains 3 entries of size 0x40 (64 bytes), that is, each entry in SHT takes 4 consecutive rows in a hex editor, and the entire table is 3*4 = 12 such rows, so the offset 0x158 is the first byte after this table. But this is just the end of the file, so there's nothing more after the SHT.
The SHT entry at index 2 (the third=last one) should be a string table that contains the names for the sections.

So let's look at those sections now, shall we?

Section #2

Let's start with section #2, since it is supposed to contain the string table with the names for all the sections, so it will be very useful in further analysis. Here's its header (the last one in the table):

                                    Name index   Type=SHT_STRTAB (bingo!)
                   Flags           .----^----. .----^----.
00000118  .----------^----------.  01 00 00 00 03 00 00 00          |........|
00000120  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000130  7e 00 00 00 00 00 00 00  11 00 00 00 00 00 00 00  |~...............|
          '----------.----------'  '----------.----------'
              Starting offset                Size

00000140  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00000150  00 00 00 00 00 00 00 00                           |........|
00000158

So this is indeed a string table (0x03 = SHT_STRTAB). It starts from offset 0x7E in the file and takes 0x11 (17) consecutive bytes. The first byte after the string table is therefore 0x8F. This byte is not a part of any section (garbage).

The string table

So let's see what's in the section containing the string table, so that we could name our sections:

0000007E                                             00 2e                |..|
00000080  73 68 73 74 72 74 61 62  00 2e 74 65 78 74 00     |shstrtab..text.|
0000008F

Here's the string table, with addresses relative to its beginning:

    +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +A +B +C +D +E +F
00: 00 2E 73 68 73 74 72 74 61 62 00 2e 74 65 78 74
10: 00

or the same in ASCII, with the NULL characters marked as :

    +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +A +B +C +D +E +F
00:  ∎  .  s  h  s  t  r  t  a  b  ∎  .  t  e  x  t
10:  ∎

So we have just 3 full string in it, with the following relative offsets:

00:  ""             (Just the empty string)
01:  ".shstrtab"    (Name for this section)
0B:  ".text"        (Name for the section that contains the executable code)

(Keep in mind, though, that sections can also address substrings inside those strings, if they share the common ending.)

We can now verify that this section (#2) is indeed named .shstrtab: its name index was 0x01 after all, wasn't it? ;)

Section #1

Now let's take apart section #1's header:

                                    Name index   Type=SHT_PROGBITS
                   Flags           .----^----. .----^----.
000000d8  .----------^----------.  0b 00 00 00 01 00 00 00          |........|
000000e0  06 00 00 00 00 00 00 00  78 00 40 00 00 00 00 00  |........x.@.....|
000000f0  78 00 00 00 00 00 00 00  06 00 00 00 00 00 00 00  |x...............|
          '----------.----------'  '----------.----------'
              Starting offset                Size

00000100  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00000110  00 00 00 00 00 00 00 00                           |........|
00000118

So this section is named .text (note the name index 0x0B) and it is of type SHT_PROGBITS, so it contains some program-defined data; the executable code in this case. It starts from the offset 0x78 in the file and takes the next 6 bytes, so the first byte after this section is at offset 0x7E (where the string table begins). Here's its contents:

00000070                           31 c0 ff c0 cd 80                |1.....|
0000007E

But wait! Remember where your "mysterious data" starts? Yes! It's the 0x78 offset! :) So this "mysterious data" is actually your executable payload :) After decoding it as Intel x86-64 opcodes we get this tiny little program:

31 C0     xor    %eax,%eax     ; Clear the EAX register to 0 (the short way).
FF C0     inc    %eax          ; Increase the EAX, so now it contains 1.
CD 80     int    $0x80         ; Interrupt 0x80 is the system call on Linux.

which is basically equivalent to calling exit(0) in C ;) because the syscall interrupt expects the operation number in EAX, which in this case is sys_exit (operation number 1).

So yeah, mystery solved :) But let's continue anyway, to learn something more, and this way we'll find out where this piece of code will be loaded in memory.

Section #0

And finally section #0. It has some part missing, but I assume it was all 0s, since the first section is always a NULL section after all. Here's its (butchered) header:

00000098                           00 00 00 00 00 00 00 00  |        ........|
*
000000d0  00 00 00 00 00 00 00 00  

But it's of no use to us. Nothing interesting here.

Program Headers Table

The last thing what's left to decode is the Program Headers Table, which – according to the information from the ELF header – starts from the offset 0x40 and takes 56 bytes, the first byte after it being at offset 0x78. Here's the dump:

       Type=PHT_EXEC   Flags=RX     Starting offset in file
          .----^----. .----^----.  .----------^----------.
00000040  01 00 00 00 05 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 40 00 00 00 00 00  00 00 40 00 00 00 00 00  |..@.......@.....|
         '----------.----------'  '----------.----------'
              Virtual address         Physical address

               Size in file            Size in memory
          .----------^----------.  .----------^----------.
00000060  7e 00 00 00 00 00 00 00  7e 00 00 00 00 00 00 00  |~.......~.......|
00000070  00 00 20 00 00 00 00 00
00000078  '----------.----------'
                 Alignment

So it says that we load the first 126 (0x7E) bytes of the file into a memory segment of the same size, and the memory segment is supposed to start from the virtual address 0x400000. Our code starts from the offset 0x78 in the file and the first byte after it has the offset 0x7E, so it basically loads the entire beginning of the file, with the ELF header and the program header table into memory, as well as our executable payload at the end of it, and stops loading afterwards, ignoring the rest of the file.

So if the beginning of the file is loaded at address 0x400000, and our program starts 120 (0x78) bytes from its beginning, it will be located at the address 0x400078 in memory :>

Now let's see what entry point is specified in the ELF header for our program:

    Executable  x86-64  Version=1   Program's entry point
          .-^-. .-^-. .----^----.  .----------^----------.
00000010  02 00 3e 00 01 00 00 00  78 00 40 00 00 00 00 00  |..>.....x.@.....|

Bingo! :> It's 0x400078, so it points at the start of our little piece of code in the memory image.

And that's all, folks! ;)