In C, f you have this type of loop:
for (i = 0; i < N; i++) sum += a[i]
where the array 'a' contains ints (4 bytes) and a cache block can store say, 32 bytes, then I know that there will be a cold miss every 8 iterations of the loop, since the processor will load 8 ints into a block, then not get a cache miss until the 9th iteration. Am I understanding that correctly, that when it gets a cache miss at a it loads a-a into a cache block, and won't load any of 'a' into cache again until it gets another cold miss at a?
Assuming that ^^ is correct, my real question is, what happens if you have something like this:
for (i = 0; i < N; i++) a[i] = a[i+1]
where 'a' has not been initialized? Would you get something similar to above, where the processor looks for each consecutive value of a[i+1] and misses only every 8? Or does it search the cache for a[i] as well in order to set the value? Would there be cache misses associated with a[i] or just a[i+1]?
And finally, what would happen if you had
for (i = 0; i < N; i++) b[i] = a[i]
Would this be analogous to the first example, where it looks for each value of a[i] and gets cache misses on every 8th iteration, or does setting the value of b[i] incur cache misses as well?