I have a question regarding offsets in files. For instance a ".exe" extension/file, when you open a file like this.
handle = open('file.exe', mode='rb')
Now when you try to access the 10th 'Byte' you can use the seek function.
handle.seek(10, 0)
A lot of the values in the PE header are RVA's, meaning Imagebase + RVA is the offset when loaded in memory. The problem is you can't seek with this value. For instance:
.idtata section has a Virtual Address (RVA) and a Raw Address (Image based). Now with the previous method you can use the raw address to read at the right offset. For a lot of values only the RVA is given for which it doesn't work.
Opening a file this way starts at 0, when loaded in memory, most of the times the imagebase is 0x00400000. Is there a way you can load the file in memory and thus using the exact offset values when it is loaded into memory? So instead of 0, the file starts at the imagebase so you can seek up the RVA's?
with Kinds Regards,
If you are on Windows, you could have the Image Loader load (i.e. unpack) the PE image into memory for you. Then you would be able to use the RVA directly, relatively to the image base, to locate the data you want.
Another option is to implement a few functions to parse the PE header and section table. The section table contains information about each section in the PE file, such as where it is located in the file (the raw file offset), and where it should be unpacked in memory, relative to the image base address. Via the section table, you could then write a function that translates an RVA into a corresponding raw file offset.
The only thing you should be aware of, is that not every RVA can be mapped back to a file offset. For instance, many sections have a region with zero-initialized data at the end, which is not explicitly represented in the binary file. Instead, the loader will pad out such sections with zeros according to the virtual size of each section (also found in the section table entries) at load time.