How do I get a meaningful stack-trace using MiniDumpWriteDump

1.6k views Asked by At

I'm trying to programatically generate a stack trace. When my users are having a crash, in particular a random one, it's hard to talk them through the process of getting a dump so I can fix the problem. In the past once they would send me the trace I would cross reference the addresses in it to the Intermediate/foo.map file to figure out which function was the problem (is that the best way?)

I built a library from various examples I found around the net, to output a minidump to make my job easier. I staged a crash, but the stack trace I get from the minidump file is wildly different from a live stack trace I get from attaching windbg. Examples of both are below:

MiniDump.dmp:

KERNELBASE.dll!76a6c42d()
[Frames below may be incorrect and/or missing, no symbols loaded for KERNELBASE.dll]
KERNELBASE.dll!76a6c42d()
kernel32.dll!75bd14bd()
game.exe!00759035()
game.exe!00575ba3()

WinDbg.exe:

0:000:x86> kv
ChildEBP RetAddr  Args to Child              
00186f44 00bc8ea9 19460268 0018a9b7 03f70a28 Minidump!crashme+0x2 (FPO: [0,0,0]) (CONV: cdecl) [c:\project\debug\minidump.cpp @ 68]
0018795c 00b9ef31 0018796c 03f56c00 6532716d Main!LoadPlugin+0x339 (FPO: [1,642,4]) (CONV: cdecl) [c:\project\main\pluginloader.cpp @ 129]
00188968 00b9667d 19460268 0018a9ac 00000000 Main!Command+0x1f1 (FPO: [2,1024,4]) (CONV: cdecl) [c:\project\main\commands.cpp @ 2617]
*** WARNING: Unable to verify checksum for C:\Game\game.exe
*** ERROR: Module load completed but symbols could not be loaded for C:\Game\game.exe
0018b1a8 005b5095 19460268 0018beac 00000000 Main!Hook::Detour+0x52d (FPO: [2,2570,0]) (CONV: thiscall) [c:\project\main\hook.cpp @ 275]
WARNING: Stack unwind information not available. Following frames may be wrong.
0018b1b4 00000000 19495200 19495200 00000006 game+0x1b5095

game.exe is not mine, and I don't have the source/symbols. The Main.dll is injected into game.exe and it provides front end functionality to load additional DLLs from within the game. The debug code, and the staged crash is in Minidump.dll. After Main.dll loads Minidump it calls AfterLoad(), which sets the exception filter, and then triggers the crash. The relevant minidump code is below:

When I opened the MiniDump.dmp I pointed it to all of my symbol files (with the exception of game.exe, which I don't have) and that part seems like it's working. I do point it to the game.exe binary since I have that. The stack trace I get out of it just really isn't helpful though. My ultimate goal is that the user can just load the DLL, cause the crash, and email the dump file to me. Then I'll attach the symbol files and binaries and be able to diagnose the problem for them. Am I doing something wrong, or is it just not possible to get what I want.

typedef BOOL (WINAPI *MINIDUMPWRITEDUMP)(
    HANDLE hProcess, 
    DWORD ProcessId, 
    HANDLE hFile, 
    MINIDUMP_TYPE DumpType,
    CONST PMINIDUMP_EXCEPTION_INFORMATION ExceptionParam,
    CONST PMINIDUMP_USER_STREAM_INFORMATION UserStreamParam,
    CONST PMINIDUMP_CALLBACK_INFORMATION CallbackParam
);

LONG WINAPI WriteDumpFilter(struct _EXCEPTION_POINTERS *pExceptionPointers)
{
    HANDLE hFile = NULL;
    HMODULE hDll = NULL;
    MINIDUMPWRITEDUMP pMiniDumpWriteDump = NULL;
    _MINIDUMP_EXCEPTION_INFORMATION ExceptionInformation = {0}; 

    //load MiniDumpWriteDump
    hDll = LoadLibrary(TEXT("DbgHelp.dll"));
    pMiniDumpWriteDump = (MINIDUMPWRITEDUMP)GetProcAddress(hDll, "MiniDumpWriteDump");

    //create output file
    hFile = CreateFile( _T( "C:\\temp\\MiniDump.dmp"), 
                            GENERIC_READ|GENERIC_WRITE, 0, NULL, 
                            CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL ); 

    //bail if we don't have a file
    if ((hFile != NULL) && (hFile != INVALID_HANDLE_VALUE)) 
    {
        //get exception information
        ExceptionInformation.ThreadId           = GetCurrentThreadId(); 
        ExceptionInformation.ExceptionPointers  = pExceptionPointers; 
        ExceptionInformation.ClientPointers     = TRUE; 

        //write the debug dump
        pMiniDumpWriteDump( GetCurrentProcess(), GetCurrentProcessId(), 
                            hFile, MiniDumpWithFullMemory, &ExceptionInformation, 
                            NULL, NULL ); 


        //close the debug output file
        CloseHandle(hFile); 
    }

    return EXCEPTION_EXECUTE_HANDLER;
}

VOID crashme() {int* foo = 0; *foo = 0;}

VOID AfterLoad(VOID)
{
    SetUnhandledExceptionFilter(WriteDumpFilter);
    crashme();
}

I tried to trim some of the fat out of all the details to simplify the problem, but I can be more explicit if needed. I found the good write-up on CodeProject, and I tried finding more background information to read in order to help me understand the problem, but what I could find didn't help me understand they were just step-by-steps to get it running (which is already is). Anyone have any idea what I'm doing wrong, or maybe point me to relevant reading?


After Sergei's suggestion I did .ecxr in windbg and got better output, but it still doesn't match the trace I get when I hook windbg straight up to the process and trigger the crash. Here is the minidump trace;

  *** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr  Args to Child              
WARNING: Stack unwind information not available. Following frames may be wrong.
0018e774 00759035 e06d7363 00000001 00000003 KERNELBASE!RaiseException+0x58
0018e7b4 00575ba3 00000000 00000000 00000001 game+0x359035
0018fc50 0057788a 009855ef 0018fdcb 00000001 game+0x175ba3
0018fc78 77b7e013 012d9230 002d91d0 002d9200 game+0x17788a
0018fc90 77ba9567 00290000 00000000 002d91d0 ntdll!RtlFreeHeap+0x7e
0018fd6c 0076ece2 0018ff78 007e1b7e ffffffff ntdll!LdrRemoveLoadAsDataTable+0x4e0
002bbc38 5c306174 61666544 00746c75 5d4c3055 game+0x36ece2
002bbc3c 61666544 00746c75 5d4c3055 8c000000 0x5c306174
002bbc40 00746c75 5d4c3055 8c000000 00000101 0x61666544
002bbc44 5d4c3055 8c000000 00000101 01000000 game+0x346c75
002bbc48 8c000000 00000101 01000000 00000000 0x5d4c3055
002bbc4c 00000000 01000000 00000000 0000006e 0x8c000000

and the trace from attaching the debugger to the process

0:000:x86> kv
ChildEBP RetAddr  Args to Child              
00186f44 00bc8ea9 19460268 0018a9b7 03f70a28 Minidump!crashme+0x2 (FPO: [0,0,0]) (CONV: cdecl) [c:\project\debug\minidump.cpp @ 68]
0018795c 00b9ef31 0018796c 03f56c00 6532716d Main!LoadPlugin+0x339 (FPO: [1,642,4]) (CONV: cdecl) [c:\project\main\pluginloader.cpp @ 129]
00188968 00b9667d 19460268 0018a9ac 00000000 Main!Command+0x1f1 (FPO: [2,1024,4]) (CONV: cdecl) [c:\project\main\commands.cpp @ 2617]
*** WARNING: Unable to verify checksum for C:\Game\game.exe
*** ERROR: Module load completed but symbols could not be loaded for C:\Game\game.exe
0018b1a8 005b5095 19460268 0018beac 00000000 Main!Hook::Detour+0x52d (FPO: [2,2570,0]) (CONV: thiscall) [c:\project\main\hook.cpp @ 275]
WARNING: Stack unwind information not available. Following frames may be wrong.
0018b1b4 00000000 19495200 19495200 00000006 game+0x1b5095

I don't have the source for game.exe (I have it for the DLLs which is where the error is), but I decompiled game.exe and here is what is at game+0x359035.

.text:00759001 ; =============== S U B R O U T I N E =======================================
.text:00759001
.text:00759001 ; Attributes: library function bp-based frame
.text:00759001
.text:00759001 ; __stdcall _CxxThrowException(x, x)
.text:00759001 __CxxThrowException@8 proc near         ; CODE XREF: .text:0040100Fp
.text:00759001                                         ; sub_401640+98p ...
.text:00759001
.text:00759001 dwExceptionCode = dword ptr -20h
.text:00759001 dwExceptionFlags= dword ptr -1Ch
.text:00759001 nNumberOfArguments= dword ptr -10h
.text:00759001 Arguments       = dword ptr -0Ch
.text:00759001 var_8           = dword ptr -8
.text:00759001 var_4           = dword ptr -4
.text:00759001 arg_0           = dword ptr  8
.text:00759001 arg_4           = dword ptr  0Ch
.text:00759001
.text:00759001                 push    ebp
.text:00759002                 mov     ebp, esp
.text:00759004                 sub     esp, 20h
.text:00759007                 mov     eax, [ebp+arg_0]
.text:0075900A                 push    esi
.text:0075900B                 push    edi
.text:0075900C                 push    8
.text:0075900E                 pop     ecx
.text:0075900F                 mov     esi, offset unk_853A3C
.text:00759014                 lea     edi, [ebp+dwExceptionCode]
.text:00759017                 rep movsd
.text:00759019                 mov     [ebp+var_8], eax
.text:0075901C                 mov     eax, [ebp+arg_4]
.text:0075901F                 mov     [ebp+var_4], eax
.text:00759022                 lea     eax, [ebp+Arguments]
.text:00759025                 push    eax             ; lpArguments
.text:00759026                 push    [ebp+nNumberOfArguments] ; nNumberOfArguments
.text:00759029                 push    [ebp+dwExceptionFlags] ; dwExceptionFlags
.text:0075902C                 push    [ebp+dwExceptionCode] ; dwExceptionCode
.text:0075902F                 call    ds:RaiseException
.text:00759035                 pop     edi
.text:00759036                 pop     esi
.text:00759037                 leave
.text:00759038                 retn    8
.text:00759038 __CxxThrowException@8 endp

My error that I'm triggering is in Minidump.dll, but this code at the top of the stack is in game.exe. There could be plenty going on inside the game.exe that I'm unaware of, could it maybe be hijacking the error that I'm triggering somehow? I.E., I trigger the error in the DLL, but something setup in the game.exe captures program flow before the exception filter that writes the minidump is called?

If that's the case, when I attach the debugger to the process, trigger the error and get the right output that points to the error being in my DLL, then that means game.exe isn't capturing the program flow before the debugger can do the trace. How could I make my minidump code behave the same way... This is getting into territory I'm not terribly familiar with. Any ideas?


I chased further back, and the function calling that one, has this line in it:

.text:00575A8D                 mov     esi, offset aCrashDumpTooLa ; "Crash dump too large to send.\n"

So, I think game.exe is hijacking the exception to do it's own dump before my code tries to get the dump. And then my dumps trace is just a trace of game.exe's dump process...


Answer

I've got it figured out. I'm not sure how to answer my own post, so here is the deal.

.text:0057494A                 push    offset aDbghelp_dll ; "DbgHelp.dll"
.text:0057494F                 call    ds:LoadLibraryA
.text:00574955                 test    eax, eax
.text:00574957                 jz      short loc_5749C8
.text:00574959                 push    offset aMinidumpwrited ; "MiniDumpWriteDump"
.text:0057495E                 push    eax             ; hModule
.text:0057495F                 call    ds:GetProcAddress
.text:00574965                 mov     edi, eax
.text:00574967                 test    edi, edi
.text:00574969                 jz      short loc_5749C8
.text:0057496B                 mov     edx, lpFileName
.text:00574971                 push    0               ; hTemplateFile
.text:00574973                 push    80h             ; dwFlagsAndAttributes
.text:00574978                 push    2               ; dwCreationDisposition
.text:0057497A                 push    0               ; lpSecurityAttributes
.text:0057497C                 push    0               ; dwShareMode
.text:0057497E                 push    40000000h       ; dwDesiredAccess
.text:00574983                 push    edx             ; lpFileName
.text:00574984                 call    ds:CreateFileA
.text:0057498A                 mov     esi, eax
.text:0057498C                 cmp     esi, 0FFFFFFFFh
.text:0057498F                 jz      short loc_5749C8
.text:00574991                 call    ds:GetCurrentThreadId
.text:00574997                 push    0
.text:00574999                 push    0
.text:0057499B                 mov     [ebp+var_1C], eax
.text:0057499E                 lea     eax, [ebp+var_1C]
.text:005749A1                 push    eax
.text:005749A2                 push    0
.text:005749A4                 push    esi
.text:005749A5                 mov     [ebp+var_18], ebx
.text:005749A8                 mov     [ebp+var_14], 1
.text:005749AF                 call    ds:__imp_GetCurrentProcessId
.text:005749B5                 push    eax
.text:005749B6                 call    ds:GetCurrentProcess
.text:005749BC                 push    eax
.text:005749BD                 call    edi
.text:005749BF                 push    esi             ; hObject
.text:005749C0                 call    ds:CloseHandle
.text:005749C6                 jmp     short loc_574A02

Thats from game.exe. It turns out game.exe does it's own minidump. My minidump was triggering after theirs so what I was seeing in my stack trace was a trace of their dump process. I found a dmp file in the game''s installation directory and once I loaded my symbols into it, it showed the correct output I was after.

2

There are 2 answers

0
Kmus On BEST ANSWER

I figured it out. Basically game.exe had its own MiniDumpWriteDump code that was triggering before my code. So the stack trace I was getting wasn't a trace of the error, it was a trace of game.exe doing its own MiniDump. I put more details up in the original post.

Thanks!

2
Sergei Vorobiev On

You are doing just fine. When you open the minidump you generated, after you load the symbols, do

.ecxr

first to set context to what you saved in ExceptionInformation parameter to MiniDumpWriteDump(). Then you will have a legit stack trace.

We use a similar dump generation mechanism at the place where I work.

there are some future gotchas though. You want to check whether your dump catch mechanism is triggered on things like an abort() call.

For that, check out _set_invalid_parameter_handler() and signal(SIGABRT, ...).