clwb+sfence, can we remove sfence if writes are cache-line aligned?

436 views Asked by At

As per information on clwb ordering (link),

"CLWB instruction is ordered only by store-fencing operations. For example, software can use an SFENCE, MFENCE, XCHG, or LOCK-prefixed instructions to ensure that previous stores are included in the write-back. CLWB instruction need not be ordered by another CLWB or CLFLUSHOPT instruction. CLWB is implicitly ordered with older stores executed by the logical processor to the same address."

If the set of operations on an Intel X86-64 is as follows, Can I remove the "sfence" and still ensure correctness if the writes (A) and write(B) are cache-line aligned.

I am asking this since on Intel Write(A) and write(B) are ordered (TSO) and write(A)->clwb(A) and write(B)->clwb(B) are ordered as per above quoted description of clwb

write(A)
clwb(A)
sfence()
write(B)
clwb(B)

I am making following assumptions

  1. compiler does not reorder these operations
  2. clwb() instruction writes back the dirty line till the persistent domain, so write(A)->clwb(A) pair ensures that the modified value of A is in persistent domain

Please tell if removing sfence can break the correctness ? if yes , on what scenarios Thanks

1

There are 1 answers

3
Peter Cordes On BEST ANSWER

For normal stores to WB memory that are both within the same cache line: yes persistence order matches x86-TSO global-observability order, see Is clflush or clflushopt atomic when system crash?. Otherwise that's not guaranteed.

It seems you mean A is fully contained within one cache line, and B within a separate one.

Without SFENCE, after a crash it would be possible to see the effect of B but not A. clwb isn't ordered, so the later one could make its store persistent first. That's what the manual is pointing out with clwb's lack of ordering wrt. normal stores.

So according to TSO write(B) happened means write(A) happened (may be it is in store buffer).

No, x86-TSO ordering is about order of commit from store buffer to L1d, the pointer of global observability. That's of course totally separate from eventual write-back (via eviction or clwb) to DRAM. Store uops can execute (write their address+data to the store buffer) in any order, but can't commit until after retirement (i.e. when they're non-speculative). Additionally, that commit is restricted to happen in program order, i.e. the order store-buffer entries were allocated in during issue/rename/allocate.

meaning write(A)->write(B) are ordered and write(B)->clwb(B) are ordered, so how can clwb(B) bypass write(B) [thus violating the order constrain of manual] and happen before clwb(A) , thus causing effect of clwb(B) visible after a crash and not clwb(A)?

No, the "implicitly ordered with older stores ... to the same address" rule only guarantees that store + clwb to the same address will write-back a version of the line that includes that store-data. Otherwise it could write-back a copy of the line while the latest store was still in the store buffer or not even executed. It doesn't mean that the whole write-back has to finish before any later stores!

The order of operations that produces B but not A visible after a crash is the following:

  • A and B execute in some order
  • A and B commit to L1d cache once this core has MESI exclusive ownership of their respective lines, becoming globally visible to other cores.
  • clwb instructions executed at some point, requesting the cache lines be written-back to DRAM at some point after the stores commit.
  • write-back of line A start at some point after it commits to L1d, and same for line B. They could start in either order since clwb's order isn't guaranteed wrt. other clwb operations to other lines, although in practice they likely start in program oder.
  • clwb-B finishes becoming persistent
  • machine loses power, before the in-flight clwb-A made it to the persistence domain. You didn't request the clwb operations be ordered wrt. each other, so this is allowed.

In terms of asm instruction reordering, the following reordering is allowed:

 store A
 store B
 clwb  B
 clwb  A     ; not ordered wrt. store B or clwb B

Of course order of execution vs. reaching the end of the store buffer vs. actual persistent commit are all separate things at least in theory, but if you want to simplify it to all steps of an instruction happening before any effects of another instruction, this reordering is still compatible with all the rules.

I think the key thing you're missing is that clwb A is a separate operation from store A, it doesn't stay stuck to it. That clwb is allowed to "happen" after other later stores. store B is to a different address, so it doesn't order clwb A.

An SFENCE can prevent this.