I am migrating a project that was run on bare-bone to linux, and need to eliminate some {disable,enable}_scheduler
calls. :)
So I need a lock-free sync solution in a single writer, multiple readers scenario, where the writer thread cannot be blocked. I came up with the following solution, which does not fit to the usual acquire-release ordering:
class RWSync {
std::atomic<int> version; // incremented after every modification
std::atomic_bool invalid; // true during write
public:
RWSync() : version(0), invalid(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
do { // wait until the object is valid
currentVersion = version.load(std::memory_order_acquire);
} while (invalid.load(std::memory_order_acquire));
lambda();
std::atomic_thread_fence(std::memory_order_seq_cst);
// check if something changed
} while (version.load(std::memory_order_acquire) != currentVersion
|| invalid.load(std::memory_order_acquire));
}
void beginWrite() {
invalid.store(true, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
}
void endWrite() {
std::atomic_thread_fence(std::memory_order_seq_cst);
version.fetch_add(1, std::memory_order_release);
invalid.store(false, std::memory_order_release);
}
}
I hope the intent is clear: I wrap the modification of a (non-atomic) payload between beginWrite/endWrite
, and read the payload only inside the lambda function passed to sync()
.
As you can see, here I have an atomic store in beginWrite()
where no writes after the store operation can be reordered before the store. I did not find suitable examples, and I am not experienced in this field at all, so I'd like some confirmation that it is OK (verification through testing is not easy either).
Is this code race-free and work as I expect?
If I use std::memory_order_seq_cst in every atomic operation, can I omit the fences? (Even if yes, I guess the performance would be worse)
Can I drop the fence in endWrite()?
Can I use memory_order_acq_rel in the fences? I don't really get the difference -- the single total order concept is not clear to me.
Is there any simplification / optimization opportunity?
+1. I happily accept any better idea as the name of this class :)
The code is basically correct.
Instead of having two atomic variables (
version
andinvalid
) you may use singleversion
variable with semantic "Odd values are invalid". This is known as "sequential lock" mechanism.Reducing number of atomic variables simplifies things a lot:
Note the difference in memory orders in
beginWrite()
andendWrite()
:endWrite()
makes sure that all previous object's modifications have been completed. It is sufficient to use release memory order for that.beginWrite()
makes sure that reader will detect object being in invalid state before any futher object's modification is started. Such garantee requires seq_cst memory order. Because of that reader uses seq_cst memory order too.As for fences, it is better to incorporate them into previous/futher atomic operation: compiler knows how to make the result fast.
Explanations of some modifications of original code:
1) Atomic modification like
fetch_add()
is intended for cases, when concurrent modifications (like anotherfetch_add()
) are possible. For correctness, such modifications use memory locking or other very time-costly architecture-specific things.Atomic assignment (
store()
) does not use memory locking, so it is cheaper thanfetch_add()
. You may use such assignment because concurrent modifications are not possible in your case (reader does not modifyversion
).2) Unlike to release-acquire semantic, which differentiate
load
andstore
operations, sequential consistency (memory_order_seq_cst
) is applicable to every atomic access, and provide total order between these accesses.