I'm using clang++ with LLVM17. I have two classes Foo and Bar that both refer to each other via a weak_ptr data member. There are multiple threads that can get copies of shared_ptr<Bar> objects. Sometimes, TSAN will report a data-race while one thread is destroying its copy of the std::shared_ptr<Bar> object.
Here is my example code and I've annotated where TSAN is reporting issues:
class Bar;
class Foo : public std::enable_shared_from_this<Foo> {
public:
std::shared_ptr<Bar> get_instance() {
std::lock_guard<std::mutex> lck(m_);
std::shared_ptr<Bar> sh = _ptr.lock();
if (sh) {
return sh;
} else {
sh = std::make_shared<Bar>(weak_from_this());
_ptr = sh; // <<<<
}
return sh;
}
private:
std::weak_ptr<Bar> _ptr;
std::mutex m_;
};
class Bar {
public:
Bar(std::weak_ptr<Foo> ptr) : _ctx(ptr) {}
private:
std::weak_ptr<Foo> _ctx;
};
void get_and_release(std::shared_ptr<Foo> aP) {
for (int i=0; i<1000; i++) {
std::shared_ptr<Bar> p = aP->get_instance(); // <<< During d'tor
}
}
int main() {
std::shared_ptr<Foo> aP = std::make_shared<Foo>();
std::vector<std::thread> threads;
for (int t=0; t<4; t++) {
threads.emplace_back(get_and_release, aP);
}
for (auto& thread : threads) {
thread.join();
}
}
When multiple threads call get_and_release() at the same time, TSAN reports the following race:
WARNING: ThreadSanitizer: data race (pid=21947)
Write of size 8 at 0x7b0c00001800 by thread T2 (mutexes: write M0):
#0 operator delete(void*) ../lib/tsan/rtl/tsan_new_delete.cpp:126:3 (example+0xea98e)
#1 void std::__1::__libcpp_operator_delete[abi:ue170006]<void*>(void*) ../llvm-17.0.6-n6964152/include/c++/v1/new:278:3 (example+0xed245)
#2 void std::__1::__do_deallocate_handle_size[abi:ue170006]<>(void*, unsigned long) ../llvm-17.0.6-n6964152/include/c++/v1/new:302:10 (example+0xed1c1)
#3 std::__1::__libcpp_deallocate[abi:ue170006](void*, unsigned long, unsigned long) ../llvm-17.0.6-n6964152/include/c++/v1/new:318:14 (example+0xed0da)
#4 std::__1::allocator<std::__1::__shared_ptr_emplace<Bar, std::__1::allocator<Bar> > >::deallocate[abi:ue170006](std::__1::__shared_ptr_emplace<Bar, std::__1::allocator<Bar> >*, unsigned long) ../llvm-17.0.6-n6964152/include/c++/v1/__memory/allocator.h:130:13 (example+0xed03e)
#5 std::__1::allocator_traits<std::__1::allocator<std::__1::__shared_ptr_emplace<Bar, std::__1::allocator<Bar> > > >::deallocate[abi:ue170006](std::__1::allocator<std::__1::__shared_ptr_emplace<Bar, std::__1::allocator<Bar> > >&, std::__1::__shared_ptr_emplace<Bar, std::__1::allocator<Bar> >*, unsigned long) ../llvm-17.0.6-n6964152/include/c++/v1/__memory/allocator_traits.h:288:13 (example+0xecfc5)
#6 std::__1::__shared_ptr_emplace<Bar, std::__1::allocator<Bar> >::__on_zero_shared_weak() ../llvm-17.0.6-n6964152/include/c++/v1/__memory/shared_ptr.h:332:9 (example+0xecc7c)
#7 std::__1::weak_ptr<Bar>::~weak_ptr() ../llvm-17.0.6-n6964152/include/c++/v1/__memory/shared_ptr.h:1777:19 (example+0xed786)
#8 std::__1::enable_if<__compatible_with<Bar, Bar>::value, std::__1::weak_ptr<Bar>&>::type std::__1::weak_ptr<Bar>::operator=<Bar>(std::__1::shared_ptr<Bar> const&) ../llvm-17.0.6-n6964152/include/c++/v1/__memory/shared_ptr.h:1836:5 (example+0xebf37)
#9 Foo::get_instance() A.cc:18:18 (example+0xeb5c7)
#10 get_and_release(std::__1::shared_ptr<Foo>) A.cc:37:38 (example+0xeb305)
Previous read of size 8 at 0x7b0c00001800 by thread T1:
#0 std::__1::__shared_count::__release_shared[abi:ue170006]() ../llvm-17.0.6-n6964152/include/c++/v1/__memory/shared_ptr.h:172:9 (example+0xed8f0)
#1 std::__1::__shared_weak_count::__release_shared[abi:ue170006]() ../llvm-17.0.6-n6964152/include/c++/v1/__memory/shared_ptr.h:213:27 (example+0xed889)
#2 std::__1::shared_ptr<Bar>::~shared_ptr[abi:ue170006]() ../llvm-17.0.6-n6964152/include/c++/v1/__memory/shared_ptr.h:772:23 (example+0xeb6d6)
#3 get_and_release(std::__1::shared_ptr<Foo>) A.cc:38:5 (example+0xeb313)
As far as I can tell this should be safe since every thread holds copies of the smart pointers. What am I misunderstanding here?
I attempted to change the std::weak_ptr<Foo> _ctx in Bar into a std::shared_ptr<Foo> instead, but it didn't fix the issue entirely. However, changing it into a std::shared_ptr<Foo> has made the data-race less likely.