As per information on clwb ordering (link),
"CLWB instruction is ordered only by store-fencing operations. For example, software can use an SFENCE, MFENCE, XCHG, or LOCK-prefixed instructions to ensure that previous stores are included in the write-back. CLWB instruction need not be ordered by another CLWB or CLFLUSHOPT instruction. CLWB is implicitly ordered with older stores executed by the logical processor to the same address."
If the set of operations on an Intel X86-64 is as follows, Can I remove the "sfence" and still ensure correctness if the writes (A) and write(B) are cache-line aligned.
I am asking this since on Intel Write(A) and write(B) are ordered (TSO) and write(A)->clwb(A) and write(B)->clwb(B) are ordered as per above quoted description of clwb
write(A)
clwb(A)
sfence()
write(B)
clwb(B)
I am making following assumptions
- compiler does not reorder these operations
- clwb() instruction writes back the dirty line till the persistent domain, so write(A)->clwb(A) pair ensures that the modified value of A is in persistent domain
Please tell if removing sfence can break the correctness ? if yes , on what scenarios Thanks
For normal stores to WB memory that are both within the same cache line: yes persistence order matches x86-TSO global-observability order, see Is clflush or clflushopt atomic when system crash?. Otherwise that's not guaranteed.
It seems you mean A is fully contained within one cache line, and B within a separate one.
Without SFENCE, after a crash it would be possible to see the effect of B but not A.
clwb
isn't ordered, so the later one could make its store persistent first. That's what the manual is pointing out with clwb's lack of ordering wrt. normal stores.No, x86-TSO ordering is about order of commit from store buffer to L1d, the pointer of global observability. That's of course totally separate from eventual write-back (via eviction or clwb) to DRAM. Store uops can execute (write their address+data to the store buffer) in any order, but can't commit until after retirement (i.e. when they're non-speculative). Additionally, that commit is restricted to happen in program order, i.e. the order store-buffer entries were allocated in during issue/rename/allocate.
No, the "implicitly ordered with older stores ... to the same address" rule only guarantees that store + clwb to the same address will write-back a version of the line that includes that store-data. Otherwise it could write-back a copy of the line while the latest store was still in the store buffer or not even executed. It doesn't mean that the whole write-back has to finish before any later stores!
The order of operations that produces B but not A visible after a crash is the following:
In terms of asm instruction reordering, the following reordering is allowed:
Of course order of execution vs. reaching the end of the store buffer vs. actual persistent commit are all separate things at least in theory, but if you want to simplify it to all steps of an instruction happening before any effects of another instruction, this reordering is still compatible with all the rules.
I think the key thing you're missing is that clwb A is a separate operation from store A, it doesn't stay stuck to it. That clwb is allowed to "happen" after other later stores. store B is to a different address, so it doesn't order clwb A.
An SFENCE can prevent this.