i'm in serious trouble with a heap/stack corruption. To be able to set a data breakpoint and find the root of the problem, i want to take two core dumps using gdb and then compare them. First one when i think the heap and stack are still ok, and a second one shortly before my program crashes.
How can i compare those dumps?
Information about my project:
- using gcc 5.x
- Plugin for a legacy, 3rd-party-program with RT-support. No sources available for the project (for me).
- Legacy Project is C, My Plugin is C++.
Other things i tried:
- Using address sanitizers -> won't work because the legacy program wont start with them.
- Using undefined behavior sanitizers -> same
- Figuring out what memory gets corrupted for data breakpoint -> no success, because the corrupted memory does not belong to my code.
- Ran Valgrind -> no errors around my code.
Thank you for your help
Independent from your underlying motivation, I'd like to get into your question. You ask how the difference between two core dumps can be identified. This is going to be lengthy, but will hopefully give you your answer.
A core dump is represented by an ELF file that contains metadata and a specific set of memory regions (on Linux, this can be controlled via
/proc/[pid]/coredump_filter) that were mapped into the given process at the time of dump creation.The obvious way to compare the dumps would be to compare a hex-representation:
The result is rarely useful because you're missing the context. More specifically, there's no straightforward way to get from the offset of a value change in the file to the offset corresponding to the process virtual memory address space.
So, more context if needed. The optimal output would be a list of VM addresses including before and after values.
Before we can get on that, we need a test scenario that loosely resembles yours. The following application includes a use-after-free memory issue that does not lead to a segmentation fault at first (a new allocation with the same size hides the issue). The idea here is to create a core dump using gdb (
generate) during each phase based on break points triggered by the code:The code:
Now, the dumps can be generated:
A quick manual inspection shows the relevant differences:
Based on that output, we can clearly see that
*g_statechanged but is still a valid pointer indump2. Indump3, the pointer becomes invalid. Of course, we'd like to automate this comparison.Knowing that a core dump is an ELF file, we can simply parse it and generate a diff ourselves. What we'll do:
PROGBITSsections of the dumpBased on
elf.h, it's relatively easy to parse ELF files. I created a sample implementation that compares two dumps and prints a diff that is similar to comparing twohexdumpoutputs usingdiff. The sample makes some assumptions (x86_64, mappings either match in terms of address and size or they only exist in dump1 or dump2), omits most error handling and always chooses a simple implementation approach for the sake of brevity.With the sample implementation, we can re-evaluate our scenario above. A except from the first diff:
The diff shows that
*gstate(address0x602260) was changed from0x7fffffffe2bcto0x4008c1:The second diff with only the relevant offset:
The diff shows that
*gstate(address0x602260) was changed from0x4008c1to0x1.There you have it, a core dump diff. Now, whether or not that can prove to be useful in your scenario depends on various factors, one being the timeframe between the two dumps and the activity that takes place within that window. A large diff will possibly be difficult to analyze, so the aim must be to minimize its size by choosing the diff window carefully.
The more context you have, the easier the analysis will turn out to be. For example, the relevant scope of the diff could be reduced by limiting the diff to addresses of the
.dataand.bsssections of the library in question if changes in there are relevant to your situation.Another approach to reduce the scope: excluding changes to memory that is not referenced by the library. The relationship between arbitrary heap allocations and specific libraries is not immediately apparent. Based on the the addresses of changes in your initial diff, you could search for pointers in the
.dataand.bsssections of the library right in the diff implementation. This does not take every possible reference into account (most notably indirect references from other allocations, register and stack references of library-owned threads), but it's a start.