Does clwb take care of the write in store buffer?

198 views Asked by At

Intel software manual says clwb "Writes back to memory the cache line (if modified) that contains the linear address specified with the memory operand from any level of the cache hierarchy in the cache coherence domain. The line may be retained in the cache hierarchy in non-modified state. clwb is ordered with respect to older writes to the cache line being written back"

My question is, in the below pseudo code

write(A)
clwb (A)

Does clwb take care of the write in store buffer? or Do I need to sfence after a write, before using clwb, like

write (A)
sfence
clwb (A)

I want to know whether the "sfence" is actually required or not? Thanks

2

There are 2 answers

1
Hadi Brais On BEST ANSWER

On Intel processors, the clwb instruction is ordered with respect to older writes to the same cache line. On AMD processors, according to Section 7.6.3 of Volume 2 of the AMD manual No. 24593, the clwb instruction is ordered with respect to older writes to the same cache line if the memory type of the target address is a cacheable memory type (i.e., WB, WT, or WP) at the time of executing the clwb instruction.

This ordering guarantee means that the most recent state of the line or a later state with respect to program order will eventually be written back if necessary to the persistence domain at some point in time after retiring the clwb instruction. Note that the persistence domain is defined by the platform.

0
Nene Du On

Here is my answer to the follow-up question: Does it mean, If I have a single thread of execution, then the correctness of operations "store A, clwb (A), store B, clwb (B)" are maintained without use of sfence on Intel X86-64 , as TSO ensured store(A) to store(B) are ordered, and clwb(A) is ordered with store(A) and clwb(B) is ordered with store(B)

clwb instructions are not ordered with each other if they flush different cache lines. TSO only guarantees that stores retire in program order (i.e., writing to cache in program order). So in your example, at the cache hierarchy, store A always completes before store B, but store B could reach memory (either volatile or non-volatile) before store A. If you only want to keep the write-back order at the cache hierarchy, no sfence is required.

But if you need to guarantee that store A always reaches the memory before store B, you need to insert a sfence between clwb(A) and store(B).