How is a 3D/4D array stored with column-major order contiguously in memory?

125 views Asked by At

While I have no trouble understanding 2-dimensional arrays, I want to know how precisely MATLAB (or modified NumPy) stores higher-dimensional arrays in memory. To avoid abstract answers, I ask my question about a concrete example of a 3D array and a 4D one:

The 3D array, which we call arr3D can be visualized as two 3-by-3 matrices A and B where B is stacked on top of A like pages of a (faced up) book are stacked on top of each other.

Page 1 is:

1 2 3
4 5 6
7 8 9

And page 2 is:

10 20 30
40 50 60
70 80 90

I know that if the storage order is row-major, arr3D is stored in memory like this:

|1|2|3|4|5|6|7|8|9|10|20|30|40|50|60|70|80|90|

How would arr3D be stored in memory if the order is column-major?

Suppose we had another 3D array called arr3D2 which has the same dimensions as arr3D and has elements that are all double the corresponding elements in arr3D.

How would the 4D array [arr3D,arr3D2] be stored in memory if the storage order is row-major?

In NumPy:

import numpy as np
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
B = np.array([[10,20,30],[40,50,60],[70,80,90]])
arr3D = np.array([A,B])

In MATLAB:

A = [1 2 3;4 5 6; 7 8  9];
B = [10 20 30;40 50 60; 70 80  90];
arr3D(:,:,1) = A;
arr3D(:,:,1) = B;
2

There are 2 answers

0
Cris Luengo On

The terms “column major” and “row major” really only apply to 2D arrays, as they specify only which dimension is contiguous in memory, and leave ambiguous the ordering of the remaining dimensions.

NumPy does C ordering or Fortran ordering (C is the default).

  • C ordering means that the last dimension is contiguous in memory, then the second to last, etc, with the first dimension being most distant. That is, arr[0,0,0] is followed by arr[0,0,1] , arr[0,0,2], etc, then comes arr[0,1,0], arr[0,1,1], etc. In 2D this corresponds to row major (because the last dimension is the row in NumPy).

  • F ordering is the reverse. arr[0,0,0] is followed by arr[1,0,0] , arr[2,0,0], etc, then comes arr[0,1,0], arr[1,1,0], etc. In 2D this corresponds to column major (because the first dimension is the column in NumPy).

MATLAB uses Fortran ordering, but the dimensions are interpreted differently than in NumPy. For a 2D array it’s the same, but NumPy puts the pages of the 3D array as the first dimension, whereas MATLAB makes them last. So in NumPy the columns are always the 2nd dimension from the end, whereas in MATLAB they’re always the first dimension. But this is just an interpretation difference, not a storage or indexing difference.

PS: actually it’s possible to construct arrays with any arbitrary storage order in NumPy, by reordering the dimensions of an existing array.

0
hpaulj On

Here's an example of constructing a 3d array from a 1d arange. As long as the operations just make a view, their base will always be x.

In [2]: x = np.arange(1,19)

The regular 'C' order:

In [3]: x.reshape(2,3,3)
Out[3]: 
array([[[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9]],

       [[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]]])

And make it column major; note how the 1,2 sequence jumps across planes, the first dimension:

In [4]: x.reshape(2,3,3, order='F')
Out[4]: 
array([[[ 1,  7, 13],
        [ 3,  9, 15],
        [ 5, 11, 17]],

       [[ 2,  8, 14],
        [ 4, 10, 16],
        [ 6, 12, 18]]])

You can ravel any of these. But read its docs to see what it does about order. With the default C order, the order is different from x (this is copy):

In [5]: x.reshape(2,3,3, order='F').ravel()
Out[5]: 
array([ 1,  7, 13,  3,  9, 15,  5, 11, 17,  2,  8, 14,  4, 10, 16,  6, 12,
       18])

But with K (or F) we get the original x order:

In [6]: x.reshape(2,3,3, order='F').ravel(order='K')
Out[6]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18])

Underlying all these multidimensional shapes are the strides, how many bytes of the 1d databuffer it has to step to travers the dimensions:

In [7]: x.reshape(2,3,3, order='F').strides
Out[7]: (4, 8, 24)    
In [8]: x.reshape(2,3,3, order='C').strides
Out[8]: (36, 12, 4)

For C order, the first dimension is 'slowest', and has the biggest stride. For F it's the last that is slowest.

But with transpose (or swapaxes) it's possible to make an order that is neither C or F:

In [9]: x.reshape(2,3,3, order='C').transpose(0,2,1)
Out[9]: 
array([[[ 1,  4,  7],
        [ 2,  5,  8],
        [ 3,  6,  9]],

       [[10, 13, 16],
        [11, 14, 17],
        [12, 15, 18]]]) 

Note how the 1,2,3 sequence is the column of the first plane. Ignoring the first dimension, that almost looks column major. strides aren't in strict order, increasing or decreasing.

In [10]: x.reshape(2,3,3, order='C').transpose(0,2,1).strides
Out[10]: (36, 4, 12)    
In [12]: x.reshape(2,3,3, order='C').transpose(0,2,1).flags
Out[12]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False