Loading of ELF executable

1.8k views Asked by At

On Pages 2-7 and 2-8 of the specification of the ELF file format there are two pictures giving an example of an executable's program headers and how they are going to be loaded into memory:

enter image description here enter image description here

The specification explains:

Although the example’s file offsets and virtual addresses are congruent modulo 4 KB for both text and data, up to four file pages hold impure text or data (depending on page size and file system block size).

  • The first text page contains the ELF header, the program header table, and other information.
  • The last text page holds a copy of the beginning of data.
  • The first data page has a copy of the end of text.
  • The last data page may contain file information not relevant to the running process.

My questions are:

  1. What does the i-th "text page" and "data page" mean?
  2. What do the 2nd and 3rd items in the above four statements mean?
  3. Why does the data padding appears right after the text segment, while the text padding appears before the data segment, making an interleaved layout?
  4. What if the executable has more than two segments (other than text and data) to load?
1

There are 1 answers

0
o11c On BEST ANSWER

A page is the smallest mappable unit of virtual memory. If you are not familiar with the basics, see the wikipedia article on virtual memory. On common systems, the size of a page is 4096 bytes, or 0x1000 in hex.

A "text page" contain executable code; a "data page" contains data. These must be mapped at fixed addresses so that offsets in the code are correct. In shared libraries or position-independent-executables, the exact virtual addresses are no longer specified but their relative position is. In this example, the 0th text page goes from 0x8048000 to 0x8048fff, which is before the start of the text segment (at 0x8048100). The 1th text page goes from 0x8049000 to 0x8049fff. The last text page goes from 0x8073000 to 0x8073fff, which goes beyond the end of the text segment (at 0x8073eff).

The first data page is at 0x8074000, but the data segment doesn't start until 0x8074f00. This page is backed by the same part of the file as the last text page, but has to be mapped separately because it has different permissions (PROT_EXEC|PROT_READ vs PROT_READ). This is what it means by "copy of the beginning of the data" / "copy of the end of the text".

If there are more than two segments, the are loaded exactly the same. "text" and "data" are completely arbitrary, what matters are the flags and addresses specified for each segment. You can view this information with readelf or objdump.

Note that in the real world, there is usually an unmapped space (a "hole") between the text and data segments, though not necessarily between read-only-data and read-write-data or initialized vs uninitialized data.

For example, running cat /proc/self/maps gives me:

ben@joyplim ~ % cat /proc/self/maps
00400000-0040c000 r-xp 00000000 fe:01 36176026                           /bin/cat
0060b000-0060c000 r--p 0000b000 fe:01 36176026                           /bin/cat
0060c000-0060d000 rw-p 0000c000 fe:01 36176026                           /bin/cat
<plus the heap, stack, library, and special kernel stuff>