How does cache coherence work in multi-core and multi-processor architecture?

3.5k views Asked by At

Let me explain my understanding and ask you to either confirm its correctness or correct me:

  1. There's a MESI protocol which allows for efficient cache coherence (https://en.wikipedia.org/wiki/MESI_protocol). It's the state of the art mechanism.
  2. For several cores of a single processor, MESI operates via L3 cache which is shared among cores of a processor.
  3. For several processors (with no shared L3), MESI operates via Main Memory.
  4. When using global variables, which are read and written by several threads, volatile type specifier is used to prevent unwanted optimizations as well as to prevent caching in registers (not in L1-3 caches). Thus, if value is not in a register but in cache or main memory, MESI would do its work to make threads see correct values of globals.
1

There are 1 answers

20
David Schwartz On

For several cores of a single processor, MESI operates via L3 cache which is shared among cores of a processor.

MESI operates at all cache levels. In some processor designs, the L3 cache serves as an efficient "switchboard" between cores. For example, if the L3 cache is inclusive and holds everything in any CPU's L1 or L2 caches, then just knowing that something isn't in the L3 cache is enough to know it's not in any other core's cache. This can reduce the amount of snooping needed. These are sophisticated optimizations though.

For several processors (with no shared L3), MESI operates via Main Memory.

I'm not sure what you're trying to say here, but it doesn't seem to correspond to anything true. MESI operates between caches. Memory isn't a cache and so has no need to participate in the MESI protocol.

You could mean that for CPUs without an L3 cache, the L2 inter-cache MESI traffic occurs on the same CPU bus as the one that connects to main memory. This used to be true for some multi-chip CPU designs before CPUs had on-chip memory controllers. But today, most laptop/desktop multi-core CPUs have on die memory controllers, so the bus that connects to memory only connects to memory. So there's no MESI traffic there. If data is in one core's L2 cache and has to get to another core's L2 cache, it doesn't go over the memory. (Think about the topology of the cores and the memory controller, that would be insane.)

When using global variables, which are read and written by several threads, volatile type specifier is used to prevent unwanted optimizations as well as to prevent caching in registers (not in L1-3 caches).

I know of no language where this is true. It's certainly not true in C/C++ where volatile is for things like signals not multithreading (at least on platform's with well-defined multi-threading APIs). And it's not true for things like Java where volatile has specific language semantics that have nothing to do with registers.

Thus, if value is not in a register but in cache or main memory, MESI would do its work to make threads see correct values of globals.

This could be true at the hardware/assembler level. That's where registers exist. But in practice it's not because while MESI makes the memory caches coherent, modern CPUs have other optimizations that create the same kinds of problems. For example, a CPU might prefetch a read or might delay a write out of order. So you need things like memory barriers in addition to MESI. This, of course, gets very platform specific.

You can think of MESI as an optimization. You still have to do whatever the platform requires in order for inter-thread memory visibility to work correctly. But MESI tremendously reduces what that work is.

Without MESI, for example, you might have a design where the only way for data to get from one core to another is through a write to main memory followed by waiting for the write to complete followed by a read from main memory. That would be a total disaster. First, you'd wind up having to flush things to main memory just in case another thread needed it. And second, all this traffic would choke out the regular memory traffic. Yuck.