Is a two-dimensional array implemented as a continuous one-dimensional array?

565 views Asked by At

I have a question about the memory layout of a two-dimensional array. When we define one, just like int a[3][4], is the memory allocated to this array continuous?

Or in other words, is a two-dimensional array implemented as a continuous one-dimensional array?

If the answer is yes, is accessing a[0][6] equivalent to accessing a[1][2]?

I wrote the following C program.

#include <stdio.h>
int main(){
    int a[3][4] = {{1, 2, 3, 4},
                   {5, 6, 7, 8},
                   {9, 10, 11, 12}};
    printf("%d %d\n", a[0][6], a[1][2]);
    return 0;
}

I find that the output is 7 7.

a[0][6] seems illegal, but it points to a[1][2], I want to know why and is such operation legal?

3

There are 3 answers

4
dbush On BEST ANSWER

This is as interesting case. Section 6.2.5p20 of the C standard defines an array as follows:

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. The element type shall be complete whenever the array type is specified. Array types are characterized by their element type and by the number of elements in the array. An array type is said to be derived from its element type, and if its element type is T , the array type is sometimes called ‘‘array of T ’’. The construction of an array type from an element type is called ‘‘array type derivation’’

So an array is a contiguous set of objects of a particular type. In the case of int a[3][4] it is an array of size 3 whose objects are also arrays. The type of the subarray is int [4], i.e. an array of size 4 of type int.

This means that a 2D array, or more accurately an array of arrays, does indeed have all of the individual members of the inner array laid out continuously.

This does not however mean that the above array can be accessed as a[0][6] as in your example.

Section 6.5.2.1p2 regarding the array subscript operator [] states:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))

And section 6.5.6p8 regarding the + operator when applied to a pointer operand states:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

There's a lot to ingest here, but the important part is that given an array of size X, the valid array indices range from 0 to X-1, and attempting to use any other index triggers undefined behavior. In particular, since a[0] has type int [4], attempting to access a[0][6] is going outside the bounds of the array a[0].

So while a[0][6] in practice would probably work, the C standard makes no guarantee that it will. And given that modern optimizing compilers will aggressively assume undefined behavior doesn't exist in a program and optimize based on that fact, you could end up in a situation where something goes wrong and you don't know why.

To summarize: yes 2D arrays are implemented that way, and no you can't access them like that.

13
Lundin On

@dbush has written a good, correct answer explaining what's guaranteed and allowed. The TL;DR is basically: yes it is contiguous but still no guarantees are made that allow you to reliably access any (sub) array out of bounds. Any pointer to an item needs to point within a valid array in order to use pointer arithmetic or the [] operator on it.

This answer adds some possible work-arounds to solve the problem.

One work-around is to use a union and "type pun" between a 2D array and a 1D array:

#include <stdio.h> 

typedef union
{
  int arr_2d [3][4];
  int arr_1d [3*4];
} arr_t;

int main (void)
{
  arr_t a = 
  { 
    .arr_2d =  
    {
      { 1, 2, 3, 4},
      { 5, 6, 7, 8},
      {9, 10, 11, 12}
    }
  };

  printf("%d %d\n", a.arr_1d[6], a.arr_1d[(1*4)+2]); // well-defined, union type punning
  return 0;
}

Another universal work-around is that it's fine to inspect any variable in C as a chunk of raw data by using character types, in which case you can regard the whole variable as an array of bytes:

#include <stdio.h>
int main (void){
    int a[3][4] = {{1, 2, 3, 4},
                   {5, 6, 7, 8},
                   {9, 10, 11, 12}};
    unsigned char* ptr = (unsigned char*)a;

    // well-defined: byte-wise pointer arithmetic on a character type
    // well-defined: dereferencing as int, since the effective type of the memory location is int:
    int x = *(int*)&ptr[sizeof(int)*6];
    int y = *(int*)&ptr[sizeof(int[4])*1 + sizeof(int)*2];

    printf("%d %d\n", x, y);
    return 0;
}

Or in case you are a fan of obscure macros, a rewrite of the previous example (not really recommended):

#include <stdio.h>

#define GET_ITEM(arr, x, y) \ // see 1)
  _Generic( **(arr),        \
            int:  *(int*) &((unsigned char*)(arr))[sizeof(*arr)*(x) + sizeof(int)*(y)] ) 

int main (void){
    int a[3][4] = {{1, 2, 3, 4},
                   {5, 6, 7, 8},
                   {9, 10, 11, 12}};
    unsigned char* ptr = (unsigned char*)a;

    // well-defined:
    printf("%d %d\n", GET_ITEM(a,0,6), GET_ITEM(a,1,2));
    return 0;
}

1) Explanation: _Generic for type safety. Cast to a character type, do byte-wise pointer arithmetic based on the size of a 1D array of int for x and the size of an int for y. Precedence goes as [] over &. The & is to get an address, then cast to an int* and dereference. The value will be returned by the macro.

6
Mark Benningfield On

Back in the wild days, before compilers got serious about emitting diagnostics for doing (very likely) stupid things, this type of code was not uncommon.

NOTE: DO NOT DO THIS

#include <stdio.h>

int main(int argc, char* argv[]) {
    int a[3][4] = {{1, 2, 3, 4},
                   {5, 6, 7, 8},
                   {9, 10, 11, 12}};

    // treat the array as a single, contiguous arrangement
    // will throw compiler warning/error
    int *p = &a; // <=== VERY BAD, invalid pointer assignment!!

    printf("%d %d\n", p[6], a[1][2]);
    return 0;
}

This demonstrates that the array is in fact laid out contiguously in row-major order. This is one of those things about C that if you really want it to, it will let you shoot yourself right in the foot.