Why is there disparity between execution time for the same code of arrays?

154 views Asked by At

If I run the following program and then run it again after swapping of i and j in sum+=arr[i][j], the execution time is very different i.e. 9.8 secs compared to 2.7 secs for before the swap. I just cannot understand why it is like this. Can someone please give me any idea about why it is so?

#include<iostream>
#include<time.h>
using namespace std;

int main()
{
    int long sum=0;
    int size = 1024;
    clock_t start, end;
    double msecs;
    start = clock();

    int **arr = new int*[size];
    for (int i = 0; i < size; i++) 
    {
        arr[i] = new int[size];
    }

    for(int kk=0; kk<1000; kk++) 
    {
        sum = 0;
        for (int i = 0; i < size; i++)
        {
            for (int j = 0; j < size ; j++)
            {
                sum += arr[i][j];
            }
        }
    }

    end = clock();  
    msecs = ((double) (end - start)) * 1000 / CLOCKS_PER_SEC;
    cout<<msecs<<endl<<endl;

    return 0;
}
1

There are 1 answers

3
brokenfoot On BEST ANSWER

That is due to spatial locality. When your program needs some data from the memory, the processor not only reads that specific data but also the neighboring data. So, in the next iteration when you need the next set of data, it is already there in your cache.

In the other case, your program can't take advantage of spatial locality since you are not reading the neighboring data in consecutive iterations.

Say your data is laid out in the memory like:

  0  1  2  3  4  5  6  7  8  9 
 10 11 12 13 14 15 16 17 18 19
 20 21 22 23 24 25 26 27 28 29

When your program needs to read say data labeled 0, it reads the entire row:
0 1 2 3 4 5 6 7 8 9

So, that when you need data labeled 1, it is already in the cache and your program runs faster.

On the contrary if you are reading data column wise, this doesn't help you, each time you get a cache miss and the processor again has to do a memory read.

In short, memory read are costly, this is processor's way of optimizing reads to save time.