SFENCE prevents NT stores from committing from the store buffer ahead of SFENCE itself.
NT store data enters an LFB directly from the store buffer.
Therefore SFENCE can only guarantees the ordering of data entering LFB.
For example,
movnti;
sfence;
movnti to another address;
The SFENCE here can only guarantees that the first NT store will be commit to LFB earlier than the next one. However, since LFB is volatile, the data has not been persisted yet. Will the data entering the LFB be persisted in the order of entering?
sfence
ensures that all earlier stores in program order become globally observable before any later stores in program order become globally observable. Stores here include data store uops,clflush
,clflushopt
,clwb
,movdiri
, andmovdir64b
.The point of GO depends on all of the following:
For example, on a modern Intel server processor, a normal data store uop without the NT hint targeting a memory location of type WB mapped to main memory reaches GO when the target cache line is fetched from memory if not already present in the L1D in a suitable coherence state and the store is committed to the cache. That's why on an Asynchronous DRAM Refresh (ADR) platform such as Intel CSX,
sfence
by itself doesn't guarantee persistence.Regarding the specific example you're asking about,
movnti
is a data store instruction with the NT hint. Assuming that the target address is mapped to main memory on an ADR platform, the point of global observability of this instruction is the same as the first point of the persistence domain. Therefore, on any Intel or AMD platform with NVDIMMs and regardless of the memory type, the data is guaranteed to be in the persistence domain before any later stores become persistent. This is a stronger guarantee than what you said (thatsfence
prevents later stores from committing before earlier stores) because commit doesn't imply persistence, but persistence can only happen after commit. Although it may be better here to use the term "retire" instead of "commit" because "retire" is meaningful architecturally and indicates changing the thread's state but "commit" is a microarchitectural operation and depends on the design.