Exceptions on unmanaged threads in .NET

2.5k views Asked by At

How do I handle situations when the my app is terminating, using a callback prior to termination?

The .NET handlers do not work in the following scenario, is SetUnhandledExceptionHandler the correct choice? It appears to have the shortcomings discussed in the following.

Scenario

I want to respond to all cases of app termination with a message and error report to our service in our .net app.

However, I have a WPF app in which two of our testers get unhandled exceptions that bypass:

  • AppDomain.UnhandledException (most importantly)
  • Application.ThreadException
  • Dispatcher.UnhandledException

They are marked SecuirtyCritical and HandleProcessCorruptedStateExceptions. legacyCorruptedStateExceptionsPolicy is set to true in the app.config

My two examples in the wild

  • VirtualBox running widows10 throws inside some vboxd3d.dll when initialising WPF somewhere (turning off vbox 3d accel "fixes it")
  • Win8 machine with suspicious option to "run on graphics card A/B" in system context menu, crashes somewhere (:/) during WPF startup but only when anti-cracking tools are applied.

Either way, when live, the app must to respond to these kinds of failures prior to termination.

I can reproduce this with an unmanaged exception, that occurs in an unmanaged thread of a PInvoked method in .net:

test.dll

BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
                     )
{
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

DWORD WINAPI myThread(LPVOID lpParameter)
{
    long testfail = *(long*)(-9022);
    return 1;
}

extern "C" __declspec(dllexport) void test()
{
    DWORD tid;
    HANDLE myHandle = CreateThread(0, 0, myThread, NULL, 0, &tid);
    WaitForSingleObject(myHandle, INFINITE);
}

app.exe

class TestApp
{
    [DllImport("kernel32.dll")]
    static extern FilterDelegate SetUnhandledExceptionFilter(FilterDelegate lpTopLevelExceptionFilter);

    [UnmanagedFunctionPointer(CallingConvention.StdCall)]
    delegate int FilterDelegate(IntPtr exception_pointers);

    static int Win32Handler(IntPtr nope)
    {
        MessageBox.Show("Native uncaught SEH exception"); // show + report or whatever
        Environment.Exit(-1); // exit and avoid WER etc
        return 1; // thats EXCEPTION_EXECUTE_HANDLER, although this wont be called due to the previous line
    }

    [DllImport("test.dll")]
    static extern void test();

    [STAThread]
    public static void Main(string[] args)
    {
        AppDomain.CurrentDomain.UnhandledException += new UnhandledExceptionEventHandler(CurrentDomain_UnhandledException);
        SetUnhandledExceptionFilter(Win32Handler);
        test(); // This is caught by Win32Handler, not CurrentDomain_UnhandledException
    }
    [SecurityCritical, HandleProcessCorruptedStateExceptions ]
    static void CurrentDomain_UnhandledException(object sender, UnhandledExceptionEventArgs e)
    {
        Exception ex = e.ExceptionObject as Exception;
        MessageBox.Show(ex.ToString()); // show + report or whatever
        Environment.Exit(-1); // exit and avoid WER etc
    }
}

This handles the failure in the vboxd3d.dll in a bare WPF test app, which of course also has the WCF Dispatcher and WinForms Application (why not) exception handlers registered.

Updates

  • In the production code I am trying to use this on, the handler appears to get overwritten by some other caller, I can get around that by calling the method every 100ms which is stupid of course.
    • On the machine with the vbox3d.dll problem, doing the above replaces the exception with one in clr.dll.
    • It appears at the time of crash, the managed function pointer passed into kernel32 is no longer valid. Setting the handler with a native helper dll, which calls a native function inside appears to be working. The managed function is a static method - I'm not sure pinning applies here, perhaps the clr is in the process of terminating...
    • Indeed the managed delegate was being collected. No "overwriting" of the handler was occuring. I've added as an answer..not sure what to accept or what the SO convention is here...
3

There are 3 answers

2
plinth On

I've had to deal with, shall we say, unpredictable unmanaged libraries.

If you're P/Invoking into the unmanaged code, you may have problems there. I've found it easier to use C++/CLI wrappers around the unmanaged code and in some cases, I've written another set of unmanaged C++ wrappers around the library before getting to the C++/CLI.

You might be thinking, "why on earth would you write two sets of wrappers?"

The first is that if you isolate the unmanaged code, it makes it easier to trap exceptions and make them more palatable.

The second is purely pragmatic - if you have a library (not a dll) which uses stl, you will find that the link will magically give all code, managed and unmanaged, CLI implementation of the stl functions. The easiest way to prevent that is to completely isolate the code that uses stl, which means that everytime you access a data structure through stl in unmanaged code you end up doing multiple transitions between managed and unmanaged code and your performance will tank. You might think to yourself, "I'm a scrupulous programmer - I'll be super careful to put #pragma managed and/or #pragma unmanaged wrappers in the right places and I'm all set." Nope, nope, and nope. Not only is this difficult and unreliable, when (not if) you fail to do it properly, you won't have a good way to detect it.

And as always, you should ensure that whatever wrappers you write are chunky rather than chatty.

Here is a typical chunk of unmanaged code to deal with an unstable library:

try {
    // a bunch of set up code that you don't need to
    // see reduced to this:
    SomeImageType *outImage = GetImage();
    // I was having problems with the heap getting mangled
    // so heapcheck() is conditional macro that calls [_heapchk()][1]
    heapcheck();
    return outImage;
}
catch (std::bad_alloc &) {
    throw MyLib::MyLibNoMemory();
}
catch (MyLib::MyLibFailure &err)
{
    throw err;
}
catch (const char* msg)
{
    // seriously, some code throws a string.
    throw msg;
}
catch (...) {
    throw MyLib::MyLibFailure(MyKib::MyFailureReason::kUnknown2);
}
1
David Higgins On

The problem with the code in the question was this:

SetUnhandledExceptionFilter(Win32Handler);

Which since a delegate is automatically created, is eqivilant to:

FilterDelegate del = new FilterDelegate(Win32Handler);
SetUnhandledExceptionFilter(del);

Problem being, that the GC can collect it, and the native->managed thunk that is created, at any point after it's final reference. So:

SetUnhandledExceptionFilter(Win32Handler);
GC.Collect();
native_crash_on_unmanaged_thread();

Will always cause a nasty crash where the handler passed into kernel32.dll is no longer a valid function pointer. This is remedied by not allowing the GC to collect:

public class Program
{
    static FilterDelegate mdel;
    public static void Main(string[] args)
    {
        FilterDelegate del = new FilterDelegate(Win32Handler);
        SetUnhandledExceptionFilter(del);
        GC.KeepAlive(del);  // do not collect "del" in this scope (main)
        // You could also use mdel, which I dont believe is collected either
        GC.Collect();
        native_crash_on_unmanaged_thread(); 
    }
}

The other answers are also a great resource; not sure what to mark as the answer right now.

3
zmbq On

An exception that can't be handled properly can always happen, and the process may die unexpectedly no matter how hard you try to protect it from within. However, you can monitor it from the outside.

Have another process that monitors your main process. If the main process suddenly disappears without logging an error or reporting things gracefully, the second process can do that. The second process can be a lot simpler, with no unmanaged calls at all, so chances of it disappearing all of a sudden are significantly smaller.

And as a last resort, when your processes start check if they've shut down properly. If not, you can report a bad shutdown then. This will be useful if the entire machine dies.