I have a multi-threaded dll for a third-party application. My dll invokes messages onto the main UI thread by calling SendMessage with a custom message type:
typedef void (*CallbackFunctionType)();
DWORD _wm;
HANDLE _hwnd;
DWORD threadId;
Initialize()
{
_wm = RegisterWindowMessage("MyInvokeMessage");
WNDCLASS wndclass = {0};
wndclass.hInstance = (HINSTANCE)&__ImageBase;
wndclass.lpfnWndProc = wndProcedure;
wndclass.lpszClassName = "MessageOnlyWindow";
RegisterClass(&wndclass);
_hwnd = CreateWindow(
"MessageOnlyWindow",
NULL,
NULL,
CW_USEDEFAULT,
CW_USEDEFAULT,
CW_USEDEFAULT,
CW_USEDEFAULT,
NULL,
NULL,
(HINSTANCE)&__ImageBase,
NULL);
threadId = GetCurrentThreadId();
}
void InvokeSync(CallbackFunctionType funcPtr)
{
if (_hwnd != NULL && threadId != GetCurrentThreadId())
SendMessage(_hwnd, _wm, 0, (LPARAM)funcPtr);
else
funcPtr();
}
static LRESULT CALLBACK wndProcedure(
HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam)
{
if (Msg == _wm)
{
CallbackFunctionType funcPtr = (CallbackFunctionType)lParam;
(*funcPtr)();
}
return DefWindowProc(hWnd, Msg, wParam, lParam);
}
The application is MDI, and I'm performing open document/extract contents/process in background/save on a bunch of documents, so it is constantly switching active documents and opening and closing new ones.
My issue is that sometimes the processing gets stuck when it's trying to invoke messages onto the main thread, using the above InvokeSync() function.
When I pause it in a debugger, I see the main thread has this call stack:
user32.dll!_NtUserGetMessage@16() + 0x15 bytes
user32.dll!_NtUserGetMessage@16() + 0x15 bytes
mfc42.dll!CWinThread::PumpMessage() + 0x16 bytes
// the rest is normal application stuff
And the background thread that's locked up has a call stack like this:
user32.dll!_NtUserMessageCall@28() + 0x15 bytes
user32.dll!_NtUserMessageCall@28() + 0x15 bytes
mydll!InvokeSync(funcPtr)
// the rest is expected dll stuff
So it appears to be getting stuck on the "SendMessage()" call, but as far as I can see, the message pump on the main thread is sitting there idle.
However, if I manually click on an inactive document (to make it active), somehow this wakes everything up, and the SendMessage() event finally goes through, and it resumes processing.
The main application uses Microsoft Fibers, 1 fiber per document. Could my SendMessage be getting stuck in a background fiber that gets switched out or something? on a fiber right before it goes inactive or something, and only by forcing a context switch does that fiber ever get around to handling its messages? I really don't understand how threads and fibers interact with each other, so I'm kind of grasping at straws.
What could cause messages to sit there unhandled like this? More importantly, is there a way to prevent this situation from occurring? Or at least, how do I debug such a situation?
I went ahead and implemented my own message queue, and a message format which uses a semaphore to notify when a message has been received, and another when it has been completed, and then repeat PostMessage every 1 second until the "message received" gets signalled, then wait for the "message complete" with infinite timeout.
Any extra PostMessages are ignored, because they no longer contain a payload to execute, they just tell the main thread to check the queue for incoming events.
Since I made these changes, I have not run into the situation again. The best I can tell, the sent message must be ending up on the queue of a switched out fiber, and forgotten until that fiber is switched in again. By reposting the message, it can just keep retrying until the active fiber notices the message sitting there.