Use of pointers on strings

1.9k views Asked by At

I am really confused about the use of pointers on strings. It feels like they obey different rules. Consider the following code

  1. char *ptr = "apple";// perfectly valid here not when declaring afterwards like next
    
    ptr = "apple"; // shouldn't it be *ptr = "apple"
    
  2. Also printf() behaves differently -

    printf("%s", ptr) // Why should I send  the address instead of the value
    
  3. Also I came across the following code in a book

    char str[]="Quest";
    char *p="Quest";
    
    str++; // error, constant pointer can't change
    
    *str='Z'; // works, because pointer is not constant
    
    p++; // works, because pointer is not constant
    
    *p = 'M'; // error, because string is constant
    

I can't understand what is supposed to imply

Please help, I can't find any info anywhere else

6

There are 6 answers

8
perror On BEST ANSWER
char *ptr;
ptr = "apple"; // shouldn't it be *ptr =      "apple"

No, because *ptr would be a char. So, you may write *ptr = 'a' but you can't write as you suggest.

printf("%s", ptr) // Why should I send  the address instead of the value

Because a string in C, is the address of a sequence of characters (char) terminated by zero (the null character aka \x0).

char str[] = "Quest";
char *p = "Quest";

str++; // error, constant pointer can't change

No, a pointer can perfectly change, but here, str is an array (which is slightly different from being a pointer). But, thus, it cannot deal with pointer arithmetic.

*str='Z'; // works, because pointer is not constant

No, it works because *str should be a char.

p++; // works, because pointer is not constant

No, it works because, this time, this is a pointer (not an array).

*p = 'M'; // error, because string is constant

Same as above, this is a char again, so it works because it is the right type and not because the string is 'constant'. And, as stated by Michael Walz in the comments, even though it might compile, it will produce an undefined behavior at runtime (most likely a crash with segfault) because the specification do not tell if the string pointed by *p is read-only or not (yet, it seems that most of the modern compilers implementation decide to make it in read-only). Which might produce a segfault.

For more information, refer to this SO question.

0
SPlatten On

"*" when used with a pointer means get the contents of what the pointer points to, in the case of:

    char* ptr;

ptr is a pointer to a character, you can assign it to a string like so:

   const char* ptr = "test";

The layout in memory of this is "t" followed by "e", "s", "t" and then finally a nul terminator '\0'.

When you assign it like ptr as above its assigning the pointer to the first memory location which happens to be a "t".

*ptr returns the contents of what ptr points to and is always the size of the type it is declared as in this example a "char", single byte.

*(++ptr) would return "e", as the ptr is incremented to the next location before returning the contents of what it now points to.

0
Masked Man On
  1. "SOME STRING" creates a char sequence in memory ending in \0 and returns its first char address so you can assign it to a pointer:
    char *ptr = "Hello";

  2. printf function also works with addresses and the type specifier define how it should read data from memory.

  3. char str[]="Quest"; char *p="Quest";
    in the first one you are creating an array with 6 rooms and storing 'Q', 'u', 'e', 's', 't', '\0' in it, and then you can change some index value by str[2] = 'x' but the array name itself is a constant that has the address to the first location it points to so you can't change it with something like str++;
    But in the second one "Quest\0" is a constant string saved into some location in memory and its first memory location stored in p so you can't change that but the pointer itself is not a const and you can do p++;.

0
John Bode On

ptr = "apple"; // shouldn't it be *ptr = "apple"

Starting from the beginning...

The string literal "apple" is stored in a 6-element array of char, like so:

+---+---+---+---+---+---+
|'a'|'p'|'p'|'l'|'e'| 0 |
+---+---+---+---+---+---+

The trailing 0 marks the end of the string (it's called the string terminator).

When an expression of type "N-element array of T" appears in an expression, it will be converted ("decay") to an expression of type "pointer to T" and the value of the expression will be the address of the first element of the array, unless the array expression is the operand of the sizeof or unary & operators, or is used to initialize a character array in a declaration.

Thus, in the statement

ptr = "apple";

the expression "apple" is converted ("decays") from an expression of type "6-element array of char" to "pointer to char". The type of the expression ptr is char *, or "pointer to char"; thus, in the assignment above, ptr will receive the address of the first element of "apple".

It should not be written as

*ptr = "apple";

since the expression *ptr evaluates to the value of the thing ptr points to, which at this point is a) indeterminate, and b) the wrong type for the assignment. The type of the expression *ptr is char, which is not compatible with char *.

I've written a utility that prints a map of items in memory; given the code

char *ptr = "apple";
char arr[] = "apple";

the map looks something like this:

       Item         Address   00   01   02   03
       ----         -------   --   --   --   --
      apple        0x400c80   61   70   70   6c    appl
                   0x400c84   65   00   70   74    e.pt

        ptr  0x7fffcb4d4518   80   0c   40   00    ..@.
             0x7fffcb4d451c   00   00   00   00    ....

        arr  0x7fffcb4d4510   61   70   70   6c    appl
             0x7fffcb4d4514   65   00   00   00    e...

The string literal "apple" lives at address 0x400c801. The variables ptr and arr live at addresses 0x7fffcb4d4518 and 0x7fffcb4d4510, respectively2.

The variable ptr contains the value 0x400c80, which is the address of the first element of the "apple" string literal (x86 stores multi-byte values in "little-endian" order, so the least-significant byte comes first, meaning you have to read right-to-left).

Remember the "except" clause above? In the second declaration, the string literal "apple" is being used to initialize an array of char in a declaration; instead of being converted to a pointer value, the contents of the string literal are copied to the array, which you can see in the memory dump.

  1. printf("%s", ptr) // Why should I send the address instead of the value

Because that's what the %s conversion specifier expects - it takes a pointer to the first character of a 0-terminated string, and will print out the sequence of characters starting at that location until it sees the terminator.

3 ... I can't understand what is supposed to imply

You cannot change the value of an array object. Let's look at what str would look like in memory:

     +---+
str: |'Q'| str[0]
     +---+
     |'u'| str[1]
     +---+
     |'e'| str[2]
     +---+
     |'s'| str[3]
     +---+
     |'t'| str[4]
     +---+
     | 0 | str[5]
     +---+ 

You can write to each str[i]3 (changing its value), but you cannot write to str because there's nothing to write to. There's no str object separate from the array elements. Even though the expression str will "decay" to a pointer value, no storage is set aside anywhere for that pointer - the conversion is done at compile time.

Similarly, attempting to modify the contents of a string literal invokes undefined behavior4; you may get a segfault, or your code may work as expected, or you may wind up launching nukes at Liechtenstein. So you can't write to *p or p[i]; however, you can write a new value to p, pointing it to a different location.


  1. Techically, it's 0x0000000000400c80; the %p specifier drops leading zeros.
  2. Same deal - technically, the values are 0x000000007fffcb4d4518 and 0x000000007fffcb4d4510. Note that the specific address values will change from run to run.
  3. *str is equivalent to str[0]
  4. The C language definition identifies certain operations which are erroneous, but doesn't place any requirements on the compiler to handle that code in any particular way. Different platforms store string literals in different ways; some put them in read-only memory, so attempting to modify them results in a segfault, while other platforms store them in a writable segment so that the operation succeeds. Some may store them in such a way that you don't get a segfault, but the string isn't changed.

0
savram On

1- I think you are making some confusion with variable declaration and definition. This line:

char *ptr = "apple";

declares a pointer to char and assigns the address of the first character "a" to the variable ptr. This line is equivalent to the following 2:

char* ptr;
ptr = "apple";

Now, string literals in C are read only. They are implicitly constant, it's the same as doing

const char* ptr;

So in fact, you can not change the contents of the location this pointer points to. Now, even if you could, the way you did it is wrong. Because ptr points to the location of the first character of the string, when you do *ptr you are accessing the contents of the first char of that string. So it expects a char, not a string. So it would be something like: *ptr = 'a';

2- Well, that's the way printf works. If you want to print a string with the %s specifier, it expects a pointer to that string, the address of the string's first character, not the string's value itself.

3- Now I'm going to comment your code.

str++; // error, constant pointer can't change

You are correct. Other people keep saying that arrays and pointers are slightly different, but they are not. Arrays are just an abstraction for the programmer to say that you're storing a sequence of values. At the assembly level, there is no difference at all. You could say that arrays are immutable pointers with mutable contents. An array stores the address of the first element of the sequence of values. You can change the contents of the array, but you can't change the address(the first element it points to).

*str='Z'; // works, because pointer is not constant

Now you're making some confusion. The pointer is actually constant, that is, you can not change the address it stores. But you can change the content the address points to, which is what the line above is doing. It's changing the first value of the sequence of values from the array.

p++; // works, because pointer is not constant

Correct. The pointer is not constant, although the content it points to is. You can change the address the pointer stores, but not the value it points at. String literals are mutable pointers to immutable strings.

*p = 'M'; // error, because string is constant

Correct, the string is immutable.

0
Steve Summit On

I'm only going to answer subquestion 1. But you've touched on a frequent but subtle confusion in C, a slight mismatch between the way you initialize a pointer, versus assign to that pointer. Watch carefully.

If I have an int variable, I can either initialize it when I declare it:

int i = 42;

Or, I can declare it on one line (without initializing it), and give it a value later:

int i;
i = 42;

No mystery there. But when pointers are involved, it looks a little different. Again, I can declare and initialize on one line:

char *ptr = "apple";

Or I can split the declaration and the assignment:

char *ptr;
ptr = "apple";

But, that looks weird at first -- based on the first syntax, shouldn't the second way look like this?

*ptr = "apple";         // WRONG

No, it shouldn't, and here's why.

ptr is a pointer to some characters. It is one way of referring to a string in C.

* is the pointer-indirection operator. In an expression, *ptr refers to the character (just the one character) that ptr points to. So if we wanted to fetch the first character of the string, we could use * to do that:

printf("first character: %c\n", *ptr);

Note that the format in this printf call uses %c, because it's just printing one character.

We can also assign pointers. If we're using pointers to char, and if we're therefore thinking of those pointers as "strings", this is one way of doing string assignment in C. If I say

ptr = "apple";

then no matter where ptr used to point, now it points to an array of characters containing the string "apple". And if I later say

ptr = "pear";

then ptr doesn't point to the string "apple" any more; now it points to a different array of characters containing the string "pear". You can think of this pointer sort of as if it's assigning all of the characters of the string at once (although that's not actually what it's doing, at all).

So if *ptr accesses just one character, and ptr is the pointer value itself, then why does the first form

char *ptr = "apple";

work?

The answer is that when you say

char *ptr = "apple";

the * that shows up in there is not the pointer-indirection operator. It is not saying that we are trying to access the first character of anything.

When you say

char *ptr = "apple";

the * is saying that ptr is a pointer. It's just like when you say

char *ptr;

the * is saying that ptr is a pointer.

C' declaration syntax for pointers is a little weird. Here's how to think about it. The syntax is

type-name thing-that-has-that-type ;

So when we say

char *ptr;

the type-name is char, and the thing-that-has-that-type is *ptr. We're saying that *ptr will be a char. And if *ptr will be a char, that means that ptr must be a pointer-to-char.

And then, when we say

char *ptr = "apple";

we're saying that ptr (which we've just got done saying is a pointer-to-char) should have as its initial value a pointer to an array containing the string "apple".