In my knowledge, concurrent access to a variable needs some kind of synchronization(mutex, atomic, memory barrier...) or else read in one thread may never gets updated value no matter how many times it try.
However, my colleague says the MESI protocol(not consider cpus with has no MESI or similar thing) able to auto synchronize between cpu caches, if read a variable which updated by other thread with no any sychronization in read and write(just plain read, for example "if(a != 0)"), after a period, read will finally gets the updated value if it keep going try. I think there is no guarantee here.
So I wrote a code to test this:
volatile int * volatile a = 0; // avoid compiler reorder
void set() {
a = new int(1);
std::cout << "set complete" << std::endl;
}
void read(int i) {
while(1) {
if(a != 0) {
std::cout << i << " detected" << std::endl;
break;
}
}
}
int main()
{
std::thread td00(std::bind(read, 0));
std::thread td01(std::bind(read, 1));
std::thread td02(std::bind(read, 2));
std::thread td03(std::bind(read, 3));
std::thread td04(std::bind(read, 4));
// wait a moment to make sure 'set' gets called after 'read' runs
std::this_thread::sleep_for(std::chrono::milliseconds(500));
std::thread td1(set);
td1.join();
td00.detach();
td01.detach();
td02.detach();
td03.detach();
td04.detach();
std::this_thread::sleep_for(std::chrono::minutes(60));
return 0;
}
However, the running can affect by many factors, sometimes it blocks, sometimes it print "detect". It can not be a strong proof.
I have searched this but the docs was unclear about this. It seems MESI indeed can do "auto sync"(programmer no need to do anything), the 'PrRd' and 'PrWr' seems just normal read write request without LOCK or CMPXCHG or something like that. However, for speed up, it introduced a store buffer, this will make cpu disorder and invalidate the effect of "auto sync". For fix the disorder, programmer needs use tools(memory barrier) to contorl it. That means programmer have to do sync manually to make thing right.
Does I understand this correct? If it is, assume programmer not to do it manually, is there any guarantee of time delay to gets the updated value? I think a read may never gets the updated value, however I can not find the evidence.
The conculsion is: x86_64 is cache coherent, a normal simple write is globally visable to all others cores or cpus which share one bus.
However, this is useless for writing normal application code(not include low level thing like compiler, os kernel...). Language memory model completely hide those cache coherent protocol from coder. Coding should not relies or utilizes those protocol feature because compiler or language virtual machine, runtime can disorder, optimize your code. Even you know what exactly will happen, disobey language memory model to write code still is subtle and error-prone.
One of the possibilities is that will make example code in question print "x detect" even before the set func calls(A reference to show how that can happen), or, variable
a
was stored in register make mesi powerless if without the volatile keywords. Let alone most language has no c/c++ comparing volatile keywords which allow coder choose just compiler not to "change" the origin code.