My team has encountered a deadlock that I suspect is a bug in the Windows implementation of SRW locks. The code below is a distilled version of real code. Here's the summary:
- Main thread acquires exclusive lock
- Main thread creates N children threads
- Each child thread
- Acquires a shared lock
- Spins until all children have acquired a shared lock
- Releases the shared lock
- Main thread releases exclusive lock
Yes this could be done with std::latch in C++20. That's not the point.
This code works most of the time. However roughly 1 in 5000 loops it deadlocks. When it deadlocks exactly 1 child successfully acquires a shared lock and N-1 children are stuck in lock_shared()
. On Windows this function calls into RtlAcquireSRWLockShared
and blocks in NtWaitForAlertByThreadId
.
The behavior is observed when used std::shared_mutex
directly, std::shared_lock
/std::unique_lock
, or simply calling SRW
functions directly.
A 2017 Raymond Chen post asks about this exact behavior, but user error is blamed.
This looks like an SRW bug to me. It's maybe worth noting that if a child doesn't attempt to latch and calls unlock_shared
that this will wake its blocked siblings. There is nothing in the documentation for std::shared_lock
or *SRW*
that suggests is allowed to block even when there is not an active exclusive lock.
This deadlock has not been observed on non-Windows platforms.
Example code:
#include <atomic>
#include <cstdint>
#include <iostream>
#include <memory>
#include <shared_mutex>
#include <thread>
#include <vector>
struct ThreadTestData {
int32_t numThreads = 0;
std::shared_mutex sharedMutex = {};
std::atomic<int32_t> readCounter;
};
int DoStuff(ThreadTestData* data) {
// Acquire reader lock
data->sharedMutex.lock_shared();
// wait until all read threads have acquired their shared lock
data->readCounter.fetch_add(1);
while (data->readCounter.load() != data->numThreads) {
std::this_thread::yield();
}
// Release reader lock
data->sharedMutex.unlock_shared();
return 0;
}
int main() {
int count = 0;
while (true) {
ThreadTestData data = {};
data.numThreads = 5;
// Acquire write lock
data.sharedMutex.lock();
// Create N threads
std::vector<std::unique_ptr<std::thread>> readerThreads;
readerThreads.reserve(data.numThreads);
for (int i = 0; i < data.numThreads; ++i) {
readerThreads.emplace_back(std::make_unique<std::thread>(DoStuff, &data));
}
// Release write lock
data.sharedMutex.unlock();
// Wait for all readers to succeed
for (auto& thread : readerThreads) {
thread->join();
}
// Cleanup
readerThreads.clear();
// Spew so we can tell when it's deadlocked
count += 1;
std::cout << count << std::endl;
}
return 0;
}
Here's a picture of the parallel stacks. You can see the main thread is correctly blocking on thread::join
. One reader thread acquired the lock and is in a yield loop. Four reader threads are blocked within lock_shared
.
This is a confirmed bug in the OS
SlimReaderWriter
API.I posted a thread in r/cpp on Reddit because I knew Reddit user u/STL works on Microsoft's STL implementation and is an active user.
u/STL posted a comment declaring it an SRW bug. He filed OS bug report" OS-49268777 "SRWLOCK can deadlock after an exclusive owner has released ownership and several reader threads are attempting to acquire shared ownership together". Unfortunately this a Microsoft internal bug tracker so we can't follow it.
Thanks to commenters in this thread (RbMm in particular) for helping fully explain and understand the observed behavior.
RbMm posted a secondary answer which appears to show that "AcquireSRWLockShared some time can really acquires a slim SRW lock in exclusive mode". Read his response for details. I think almost everyone would be surprised by this behavior!