I'm trying to better understand the PE format, and I'm wondering what the relationship between sections and data directories are in a PE file. Opening up a PE file I notice that they often overlap, but I'm not clear on why, or how they relate, and Microsoft's official PE file format spec doesn't really seem to make this any more clear.
I understand that the name value of a section header can be changed and so isn't a guaranteed reference to a specific block, and that as such data directories should be relied on for finding a specific block within the file.
In an example PE file I have opened I notice that the .text section has the same offset as the Import Address Table data directory header, though the IAT size is listed as 8, whilst the .text section size is 6804. In contrast the resource data directory header states that it starts at 16384, and is 1568 in length - tallying precisely with the entries for the .rsrc section. The latter makes sense to me, the former doesn't.
So what are the differing purposes of sections vs. data directories? why do both concepts exist, and why do they sometimes overlap where it doesn't appear to make sense for them to do so?
Sections are meant to package things with "nearly" the same memory protections.
For example let's take calc.exe:
The code section here has a section protection (
IMAGE_SECTION_HEADER.Characteritics
) set to 0x60000020:The .idata section (import section) has a value of 0x40000040:
On some case, the linker might decide that the same memory protections will be applied to different sections and merge them together (you can force this setting by using the
/MERGE
linker option).Citing Matt Pietrek from his wonderful two-part article "An In-Depth Look into the Win32 Portable Executable File Format" (which can be found here (1/2) (2/2), here (1/2) (2/2) and in .chm format (1/2) (2/2)):
This is usually true if the sections shares the same
IMAGE_SCN_MEM_READ
/IMAGE_SCN_MEM_WRITE
protections: that's why on some case you might have the import table into the code section (even tough the import table is obviously not meant to be executed). As you can only read the code and import sections (you can't write to them) that's enough for the linker to merge them into the same section.From the same article:
AFAIK, the resource (.rsrc) and relocation (.reloc) sections are always left alone. The reason for the resource section to be left alone might be because some APIs rely on it.
On the other hand, data directories tells you where to find important parts of the PE file (import, export, debug, TLS, resources, relocations, etc.) and even if different sections are merged, you can still find the relevant piece of data.