strange behaviour with pointer arithmetics in C

130 views Asked by At

Please can someone explain to me this strange pointer behaviour. Did I miss something?!?

start first attempt

int *foo=(int*)malloc(sizeof(int)*4);//allocates memory for 4 integer with size of sizeof(int) foo points to the address of the first element-  foo=&foo[0]

for (int i = 0; i < 4; i++)
{
    *foo=i+1;
    foo++;
    printf("foo[%d] :%d \n",i,foo[i]);
}

Otput is something like this

foo[0] :0
foo[1] :0
foo[2] :0
foo[3] :0

end first attempt

Lets add one tiny thing to our code:

start second attempt

int *foo=(int*)malloc(sizeof(int)*4);//same as first attempt

int* ptr=foo;//???????????????????? why this is necessary?

printf("Value of foo: %d value of ptr is: %d &foo[0] %d\n",foo,ptr, &foo[0]);

for (int i = 0; i < 4; i++)
{
    *ptr=i+1;//assign and update via ptr
    ptr++;
    printf("foo[%d] :%d \n",i,foo[i]);
}

Output

Value of foo: 1322628816 value of ptr is: 1322628816 &foo[0] 1322628816
foo[0] :1
foo[1] :2
foo[2] :3
foo[3] :4

end second attempt

Note: I use Ubuntu 18.04, gcc (Ubuntu 11.4.0-2ubuntu1\~18.04) 11.4.0 with GNU C Library (Ubuntu GLIBC 2.27-3ubuntu1.6) stable release version 2.27

I try exactly what I wrote above

5

There are 5 answers

0
Barmar On

After you do foo++, foo no longer points to the beginning of the array, it points the i+1th element of the array. When you then try to access foo[i], you're accessing the element i places after this element, i.e. original_foo[i+1+i].

It works in the second version because the pointer you're indexing is not the one you're incrementing. So foo continues to point to the beginning of the array, and foo[i] is the element you just assigned.

0
DevSolar On
*foo=i+1;

This initializes the first int of the four-int area you malloc()ed.

foo++;

This makes foo point to the second int.

printf("foo[%d] :%d \n",i,foo[i]);

You're printing foo + i, i.e. the second (uninitialized) int, invoking UB.

Things get only worse from there. The second iteration of your loop (with i == 1)...

*foo=i+1;

foo still points to the second int, so you write 1 there...

foo++;

foo now points to the third int...

printf("foo[%d] :%d \n",i,foo[i]);

...and you print foo + i, that is the third int plus 1 -- the fourth int, which again is uninitialized (again, undefined behavior).

In the third and fourth iteration of your loop, you end up reading from memory you haven't even malloc()ed.


Your second example works since you increment ptr for writing, and i (as index on foo) for reading. As you no longer double-increment, that second example works.

The correct solution for your first example, of course, would be to use foo[i] consistently and not increment foo.

0
Ted Lyngmo On
for (int i = 0; i < 4; i++)
{
    *foo=i+1;                           // 1
    foo++;                              // 2
    printf("foo[%d] :%d \n",i,foo[i]);  // 3
}
  1. Here you assign i+1 to the int foo points at. That in itself is not wrong.
  2. Here you make foo point at the next element in the allocated memory. You've now lost the base pointer - unless you later restore it by counting backwards or saving the pointer before the loop starts.
  3. In foo[i] you add yet i to where the base pointer points. You've now effectively made it *(original_foo + i + i) which very soon becomes out of bounds.

An idiomatic solution would be to not change foo at all:

for (int i = 0; i < 4; i++) {
    foo[i] = i + 1;
    printf("foo[%d] :%d \n", i, foo[i]);
}

Demo

Regarding:

int* ptr=foo;//???????????????????? why this is necessary?

That's the saving of the base pointer I mentioned as a possible workaround in point 2 above. In the following loop, you only change ptr and let the base pointer foo be untouched. Dereferencing foo[i] (same as *(foo + i)) then does the correct thing.

for (int i = 0; i < 4; i++)
{
    *ptr=i+1;                          // 1
    ptr++;                             // 2
    printf("foo[%d] :%d \n",i,foo[i]); // 3
}
  1. Again, assign i+1 to where ptr points.
  2. Step the temporary int* one step ahead.
  3. Dereference foo[i] (*(foo + i)) which dereferences the same address that ptr pointed at prior to ptr++.
0
Vlad from Moscow On

Let's at first consider the first code snippet

int *foo=(int*)malloc(sizeof(int)*4);//allocates memory for 4 integer with size of sizeof(int) foo points to the address of the first element-  foo=&foo[0]

for (int i = 0; i < 4; i++)
{
    *foo=i+1;
    foo++;
    printf("foo[%d] :%d \n",i,foo[i]);
}

Firstly you allocated an array for 4 objects of the type int using the function malloc

int *foo=(int*)malloc(sizeof(int)*4);//allocates memory for 4 integer with size of sizeof(int) foo points to the address of the first element-  foo=&foo[0]

The allocated memory is uninitialized. It can contain all zeroes or some random values. That is the memory contains indeterminate values.

In the for loop

for (int i = 0; i < 4; i++)
{
    *foo=i+1;
    foo++;
    printf("foo[%d] :%d \n",i,foo[i]);
}

you are assigning values i + 1 to each object

*foo=i+1;

but then you are increasing the pointer

foo++;

and moreover using the pointer arithmetic foo[i] are trying to output values that even are not stored in the allocated memory.

As a result there is an access to uninitialized memory and even outside the allocated memory. So this call of printf

printf("foo[%d] :%d \n",i,foo[i]);

invokes undefined behavior.

How to change the for loop to output valid values?

Just do not increase the pointer. That is write

for (int i = 0; i < 4; i++)
{
    foo[i] = i+1;
    printf("foo[%d] :%d \n",i,foo[i]);
}

In this case the pointer foo will still have the address of the extent of the allocated memory and you can free the allocated memory after the loop like

free( foo );

It just occured such a way that the memory you are accessing contains zeroes. However in general the output can differ from that

foo[0] :0
foo[1] :0
foo[2] :0
foo[3] :0

and can display any random values.

In the for loop of the second code snippet

for (int i = 0; i < 4; i++)
{
    *ptr=i+1;//assign and update via ptr
    ptr++;
    printf("foo[%d] :%d \n",i,foo[i]);
}

the expressions *ptr and foo[i] are equivalent.

i means the i-th element of the array. If for example i was equal to 0 then *ptr and foo[0] yeilds the same element. When i was increased then the expression foo[1] is evaluated like *( foo + 1 ). But it is the same as ptr++ and then using *ptr because after ptr++ ptr is equal to ptr + 1. In other words the expression foo[1] that is the same as *( foo + 1 ) can be represented like *( ( ptr = ptr + 1 ) ) or like an expression with the comma operator *( ptr++, ptr ).

From the C Standard (6.5.2.1 Array subscripting)

2 A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).

0
Tom On

This is not strange behaviour.

First attempt approach is wrong. Correct approach is:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    //allocates memory for 4 integer with size of sizeof(int) foo points 
    //to the address of the first element-  foo=&foo[0]
    int *foo=(int*)malloc(sizeof(int)*4);

    if(foo == NULL){
        printf("malloc memory failed\n");
        return -1;
    }
    memset(foo, 0, sizeof(int)*4);
    for (int j = 0; j < 4; j++){
        printf("foo[%d] address is %p\n", j, &foo[j]);
    }
    printf("\n");
    for (int i = 0; i < 4; i++)
    {
            foo[i]=i+1;
            printf("foo[%d] :%d \n",i,foo[i]);
    }
    free(foo);
    return 0;
}

Run it will output:

foo[0] address is 0x5635caaac2a0
foo[1] address is 0x5635caaac2a4
foo[2] address is 0x5635caaac2a8
foo[3] address is 0x5635caaac2ac

foo[0] :1 
foo[1] :2 
foo[2] :3 
foo[3] :4 

Second attempt is correct:

int* ptr=foo;//???????????????????? why this is necessary?

type of value of ptr is int*, value of address of ptr is &foo[0], so when ptr + 1 is equal to value of address of ptr + sizeof(int) = &foo[0] + 4.

When you int *foo=(int*)malloc(sizeof(int)*4);, you can see foo as int foo[4], about array see more details at Why subtracting the addresses of consecutive values in the array gives different values, when stored and not stored in variables?