how cpu cache when physical address is not contiguous

59 views Asked by At

E.g. An array in my process which use contiguous virtual address. But cpu uses cpu cache on physical address which is maybe not contiguous in my array. Does cpu cache fail in this scenario or that cpu or os do sth help?

Do they do something or just let go?

1

There are 1 answers

1
Peter Cordes On

Caches don't "fail", they're already designed to cache potentially scattered lines, not only big contiguous arrays. (And to avoid homonym/synonym problems even if you have multiple virtual mappings of the same physical page.)

Some lines of the same array might have the same index (map to the same set) in L2 cache even if they're only 4K apart in virtual address, but a multiple of some larger power of 2 apart in physical address. But set-associative (not direct-mapped) caches can deal with this unless you get very unlucky, then you might get some extra evictions (and conflict-misses) you didn't expect.

A largepage / hugepage (madvise(MADV_HUGE)) will hint the Linux kernel to use transparent hugepages for a memory region, so contiguous virtual is contiguous physical in 2MiB chunks (for example on x86-64), instead of 4K.


Many x86 CPUs at least have L1d that's associative enough and small enough that VIPT = PIPT, that all the index bits come from the offset-within-page part of the address, so aliasing the same set or not only depends on the offset-within-page part of the address. (Tags are physical, so you can get cache hits when you have two virtual mappings of the same physical page.)

Some other ISAs rely on the OS's help, e.g. page colouring, making the low 1 or 2 bits of the virtual page-number match the physical (page-frame) number) to avoid cache synonym/homonym problems in L1i/L1d. This makes the effective offset-within-page part of the address wider by a couple bits for the purpose of indexing a set, which is also what matters for cache conflict misses. (See Virtually indexed physically tagged cache Synonym for an example of how this shakes out with cache index and tag bits.)

(In the Linux kernel, PAGE_SHIFT is the shift count such that addr >> page_shift would shift out all the offet-within-page bits, leaving the page number for a non-hugepage. "Page split" is also used to describe where the break is between page number and offset-within-page bits of address. It's the number of offet-within-page bits in an address.)


But outer caches (e.g. L2 and L3) use more index bits. They work purely on physical addresses (PIPT), with virt to phys happening in parallel with access to L1d, so it's already completed before L2 access in case of an L1 miss. So there's no speed benefit to making them VIPT, so PIPT is pretty much universal for levels of cache other than L1.

This does make indexing (and thus aliasing the same set) dependent on contiguous physical or not. For a single contiguous array, every line in a size = cache_size/associativity maps to a different set so can't conflict. e.g. for Skylake's 256 KiB 4-way L2 cache, that's 64K or 1024 sets.

Large shared last-level caches (L3 in typical x86) often index using a hash function of all the high bits of the physical address, so they're less sensitive to contiguous physical address or not and resistant to conflict misses even if a lot of the "hot" cache lines are for example in the first 64 bytes of various different pages scattered around. (Unlike L1d and L2 caches where those lines could only be caches by one (L1d) or a few (L2) of the total sets, leaving others unused.) For distributed L3 caches (like Intel where there's a slice next to each core), a good hash function hopefully also avoids one stop on the ring bus or mesh being too hot.