data locality for implementing 2d array in c/c++

1.5k views Asked by At

Long time ago, inspired by "Numerical recipes in C", I started to use the following construct for storing matrices (2D-arrays).

double **allocate_matrix(int NumRows, int NumCol)
{
  double **x;
  int i;

  x = (double **)malloc(NumRows * sizeof(double *));
  for (i = 0; i < NumRows; ++i) x[i] = (double *)calloc(NumCol, sizeof(double));
  return x;
}

double **x = allocate_matrix(1000,2000);
x[m][n] = ...;

But recently noticed that many people implement matrices as follows

double *x = (double *)malloc(NumRows * NumCols * sizeof(double));
x[NumCol * m + n] = ...;

From the locality point of view the second method seems perfect, but has awful readability... So I started to wonder, is my first method with storing auxiliary array or **double pointers really bad or the compiler will optimize it eventually such that it will be more or less equivalent in performance to the second method? I am suspicious because I think that in the first method two jumps are made when accessing the value, x[m] and then x[m][n] and there is a chance that each time the CPU will load first the x array and then x[m] array.

p.s. do not worry about extra memory for storing **double, for large matrices it is just a small percentage.

P.P.S. since many people did not understand my question very well, I will try to re-shape it: do I understand right that the first method is kind of locality-hell, when each time x[m][n] is accessed first x array will be loaded into CPU cache and then x[m] array will be loaded thus making each access at the speed of talking to RAM. Or am I wrong and the first method is also OK from data-locality point of view?

5

There are 5 answers

0
crazypeter On

The two methods are quite different.

  • While the first method allows for easier direct access to the values by adding another indirection (the double** array, hence you need 1+N mallocs), ...
  • the second method guarantees that ALL values are stored contiguously and only requires one malloc.

I would argue that the second method is always superior. Malloc is an expensive operation and contiguous memory is a huge plus, depending on the application.

In C++, you'd just implement it like this:

std::vector<double> matrix(NumRows * NumCols);
matrix[y * numCols + x] = value;  // Access

and if you're concerned with the inconvenience of having to compute the index yourself, add a wrapper that implements operator(int x, int y) to it.

You are also right that the first method is more expensive when accessing the values. Because you need two memory lookups as you described x[m] and then x[m][n]. There is no way the compiler will "optimize this away". The first array, depending on its size, will be cached, and the performance hit may not be that bad. In the second case, you need an extra multiplication for direct access.

0
nakiya On

In the first method you use, the double* in the master array point to logical columns (arrays of size NumCol).

So, if you write something like below, you get the benefits of data locality in some sense (pseudocode):

foreach(row in rows):
    foreach(elem in row):
        //Do something

If you tried the same thing with the second method, and if element access was done the way you specified (i.e. x[NumCol*m + n]), you still get the same benefit. This is because you treat the array to be in row-major order. If you tried the same pseudocode while accessing the elements in column-major order, I assume you'd get cache misses given that the array size is large enough.

In addition to this, the second method has the additional desirable property of being a single contiguous block of memory which further improves the performance even when you loop through multiple rows (unlike the first method).

So, in conclusion, the second method should be much better in terms of performance.

6
Paul R On

For C-style allocations you can actually have the best of both worlds:

double **allocate_matrix(int NumRows, int NumCol)
{
  double **x;
  int i;

  x = (double **)malloc(NumRows * sizeof(double *));
  x[0] = (double *)calloc(NumRows * NumCol, sizeof(double)); // <<< single contiguous memory allocation for entire array
  for (i = 1; i < NumRows; ++i) x[i] = x[i - 1] + NumCols;
  return x;
}

This way you get data locality and its associated cache/memory access benefits, and you can treat the array as a double ** or a flattened 2D array (array[i * NumCols + j]) interchangeably. You also have fewer calloc/free calls (2 versus NumRows + 1).

1
Stefan Monov On

No need to guess whether the compiler will optimize the first method. Just use the second method which you know is fast, and use a wrapper class that implements for example these methods:

double& operator(int x, int y);
double const& operator(int x, int y) const;

... and access your objects like this:

arr(2, 3) = 5;

Alternatively, if you can bear a little more code complexity in the wrapper class(es), you can implement a class that can be accessed with the more traditional arr[2][3] = 5; syntax. This is implemented in a dimension-agnostic way in the Boost.MultiArray library, but you can do your own simple implementation too, using a proxy class.

Note: Considering your usage of C style (a hardcoded non-generic "double" type, plain pointers, function-beginning variable declarations, and malloc), you will probably need to get more into C++ constructs before you can implement either of the options I mentioned.

0
Klitos Kyriacou On

If NumCol is a compile-time constant, or if you are using GCC with language extensions enabled, then you can do:

double (*x)[NumCol] = (double (*)[NumCol]) malloc(NumRows * sizeof (double[NumCol]));

and then use x as a 2D array and the compiler will do the indexing arithmetic for you. The caveat is that unless NumCol is a compile-time constant, ISO C++ won't let you do this, and if you use GCC language extensions you won't be able to port your code to another compiler.