Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore.

15 St Margarets, NY 10033
(+381) 11 123 4567



How to Compare Core Dumps for Simple Time Travel Debugging

How can the difference between two Linux core dumps be identified and why would this even come up? This is going to be lengthy, but will hopefully give you your answer to both of those questions.

The Case for Comparing Core Dumps

Comparing two core dumps is only meaningful if they represent the same process at different points in time. If that’s the case, they could be thought of as process snapshots. Consider an application that triggers a segmentation fault after a random uptime. If the root cause is suspected to be memory corruption and post-mortem debugging does not provide any hints, it would be helpful to go back in time to inspect the memory state before the fatal error.

In the best case, all of that should be done with minimal overhead because the issue only occurs in production in our thought experiment. Also, the actual memory locations of interest are unknown, so being able to visualize relevant memory changes before the fatal error would be desirable. A set of core dumps could provide simple low-overhead time travel debugging in respect to process memory.

In most real-world debugging scenarios involving memory corruption, a memory diff would be too large to be useful. In specific cases involving mostly read-only memory and a limited set of debugging alternatives, going the diff route might just be what’s needed to identify the constellation leading to the corruption. With all that talk about comparing two core dumps, How can this diff even be generated?

Naive Comparison

A core dump is represented by an ELF file that contains metadata and a specific set of memory regions (on Linux, this can be controlled via /proc/[pid]/coredump_filter) that were mapped into the given process at the time of dump creation.

The obvious way to compare the dumps would be to compare a hex-representation:

The result is rarely useful because you’re missing the context. More specifically, there’s no straightforward way to get from the offset of a value change in the file to the offset corresponding to the process virtual memory address space.

So, more context if needed. The optimal output would be a list of VM addresses including before and after values.

Creating a Test Scenario

Before we can get on that, we need a test scenario to validate our comparison approach. The following sample includes a use-after-free memory issue that does not lead to a segmentation fault at first (a new allocation with the same size hides the issue). The idea here is to create a core dump using GDB (generate) during each phase based on break points triggered by the code:

  1. dump1: Correct state
  2. dump2: Incorrect state, no segmentation fault
  3. dump3: Segmentation fault

The sample code:

Now, the dumps can be generated:

A quick manual inspection shows the relevant differences:

Based on that output, we can clearly see that *g_state changed but is still a valid pointer in dump2. In dump3, the pointer becomes invalid. Of course, we’d like to automate this comparison.

Context-Aware Comparison

Knowing that a core dump is an ELF file, we can simply parse it and generate a diff ourselves. What we’ll do:

  1. Open a dump
  2. Identify PROGBITS sections of the dump
  3. Remember the data and address information
  4. Repeat the process with the second dump
  5. Compare the two data sets and print the diff

Based on elf.h, it’s relatively easy to parse ELF files. I created a sample implementation that compares two dumps and prints a diff that is similar to comparing two hexdump outputs using diff. The sample makes some assumptions (x86_64, mappings either match in terms of address and size or they only exist in dump1 or dump2), omits most error handling and always chooses a simple implementation approach for the sake of brevity.

With the sample implementation, we can re-evaluate our scenario above. A excerpt from the first diff:

The diff shows that *gstate (address 0x602260) was changed from 0x7fffffffe2bc to 0x4008c1:

The second diff with only the relevant offset:

The diff shows that *gstate (address 0x602260) was changed from 0x4008c1 to 0x1.


There you have it: a core dump diff. Now, whether or not that can prove to be useful depends on various factors, one being the timeframe between the two dumps and the activity that takes place within that window. A large diff will possibly be difficult to analyze, so the aim must be to minimize its size by choosing the diff window carefully.

The more context you have, the easier the analysis will turn out to be. For example, the relevant scope of the diff could be reduced by limiting it to addresses of the .data and .bss sections of the executable or library to be debugged if changes in there are relevant to the debugging scenario.

Another approach to reduce the scope: excluding changes to memory that is not referenced by the debugging subject. The relationship between arbitrary heap allocations and the executable or specific libraries is not immediately apparent. Based on the the addresses of changes in your initial diff, you could search for pointers in the .data and .bss sections of the executable or library right in the diff implementation. This does not take every possible reference into account (most notably indirect references from other allocations, register and stack references of library-owned threads), but it’s a start.

Credit: Source link

Previous Next
Test Caption
Test Description goes like this