C character array and its length

858 views Asked by At

I am studying now C with "C Programming Absolute Beginner's Guide" (3rd Edition) and there was written that all character arrays should have a size equal to the string length + 1 (which is string-termination zero length). But this code:

#include <stdio.h>
main()
{
    char name[4] = "Givi";
    printf("%s\n",name);
    return 0;
}

outputs Givi and not Giv. Array size is 4 and in that case it should output Giv, because 4 (string length) + 1 (string-termination zero character length) = 5, and the character array size is only 4.

Why does my code output Givi and not Giv?

I am using MinGW 4.9.2 SEH for compilation.

5

There are 5 answers

2
Andrei Bârsan On BEST ANSWER

You are hitting what is considered to be undefined behavior. It's working now, but due to chance, not correctness.

In your case, it's because the memory in your program is probably all zeroed out at the beginning. So even though your string is not terminated properly, it just so happens that the memory right after it is zero, so printf knows when to stop.

+-----------------------+
|G|i|v|i|\0|\0|...      |
+-----------------------+
| your  | rest of       |
| stuff | memory (stack)|
+-----------------------+

Other languages, such as Java, have safeguards against this sort of situations. Languages like C, however, do less hand holding, which, on the one hand, allows more flexibility, but on the other, give you much, much more ways to shoot you in the foot with subtle issues such as this one. In other words, if your code compiles, that doesn't mean it's correct and it won't blow up now, in 5 minutes or in 5 years.

In real life, this is almost never the case, and your string might end up getting stored next to other things, which would always end up getting printed out together with your string. You never want this. Situations like this might lead to crashes, exploits and leaked confidential information.

See the following diagram for an example. Imagine you're working on a web server and the string "secret"--a user's password or key is stored right next to your harmless string:

+-----------------------+
|G|i|v|i|s|e|c|r|e|t    |
+-----------------------+
| your  | rest of       |
| stuff | memory (stack)|
+-----------------------+

Every time you would output what you would think is "Givi", you'd end up printing out the secret string, which is not what you want.

7
mazhar islam On

The following line:

char name[4] = "Givi";

May give warning like:

string for array of chars is too long

Because the behavior is Undefined, still compiler may pass it. But if you debug, you will see:

name[0]                   'G'
name[1]                   'i'
name[2]                   'V'
name[3]                   '\0'

And so the output is

Giv

Not Give as you mentioned in the question!

I'm using GCC compiler.

But if you write something like this:

char name[4] = "Giv";

Compiles fine! And output is

Giv

0
too honest for this site On

What your book states is basically right, but there is missing the phrase "at least". The array can very well be larger.

You already stated the reason for the min length requirement. So what does that tell you about the example? It is crap!

What it exhibits is called undefined behaviour (UB) and might result in daemons flying out your nose for the printf() - not the initializer. It is just not covered by the C standard (well ,the standard actually says this is UB), so the compiler (and your libraries) are not expected to behave correctly.

For such cases, no terminator will be appended explicitly, so the string is not properly terminated when passed to `printf()".

Reason this does not produce an error is likely some legacy code which did exploit this to safe some bytes of memory. So, instead of reporting an error that the implicit trailing '\0' terminator does not fit, it simply does not append it. Silently truncating the string literal would also be a bad idea.

0
qwertz On

The byte after the last character always has to be 0, otherwise printf would not know when the string is terminanted and would try to access bytes (or chars) while they are not 0.

As Andrei said, apparently it just happened, that the compiler put at least one byte with the value 0 after your string data, so printf recognized the end of the string.

This can vary from compiler to compiler and thus is undefined behaviour.

There could, for instance, be a chance to have printf accessing an address, which your program is not allowed to. This would result in a crash.

0
Richard Chambers On

In C text strings are stored as zero terminated arrays of characters. This means that the end of a text string is indicated by a special character, a numeric value of zero (0), to indicate the end of the string.

So the array of text characters to be used to store a C text string must include an array element for each of the characters as well as an additional array element for the end of string.

All of the C text string functions (strcpy(), strcmp(), strcat(), etc.) all expect that the end of a text string is indicated by a value of zero. This includes the printf() family of functions that print or output text to the screen or to a file. Since these functions depend on seeing a zero value to terminate the string, one source of errors when using C text strings is copying too many characters due to a missing zero terminator or copying a long text string into a smaller buffer. This type of error is known as a buffer overflow error.

The C compiler will perform some types of adjustments for you automatically. For instance:

char *pText = "four";   // pointer to a text string constant, compiler automatically adds zero to an additional array element for the constant "four"
char text[] = "four";   // compiler creates a array with 5 elements and puts the characters four in the first four array elements, a value of 0 in the fifth
char text[5] = "four";  // programmer creates array of 5 elements, compiler puts the characters four in the first four array elements, a value of 0 in the fifth

In the example you provided a good C compiler should issue at the minimum a warning and probably an error. However it looks like your compiler is truncating the string to the array size and is not adding the additional zero string terminator. And you are getting lucky in that there is a zero value after the end of the string. I suppose there is also the possibility that the C compiler is adding an additional array element anyway but that would seem unlikely.