Writing in the location outside of array

378 views Asked by At

I've just started learning programming. This is my first post. I'm reading a book "C Programming Language" by Kernighan and Ritchie, and I came across an example that I don't understand (section 1.9, p 30).

This program takes text as input, determines the longest line, and prints it. Char array line[MAXLINE] is declared, where MAXLINE is 1000. This should mean that the last element of this array has index of MAXLINE-1, which is 999. However, if you look at function getline, which is being passed line[] array as an argument (and MAXLINE as lim), it appears that if user input is a line longer than MAXLINE, i will be incremented until i = lim, that is, i = MAXLINE. Therefore, the statement line[i] = '\0' will be line[MAXLINE] = '\0'.

This looks wrong to me - how can we write to the line[MAXLINE] location, if the size of line[] is MAXLINE. Wouldn't it be writing into the location outside of the array?

The only explanation I can come up with is that when declaring char array[size], C language actually creates char array[size+1] array, where the last element is reserved for the NULL character. If so, this is pretty confusing, and isn't mentioned in the book. Can anyone confirm this, or explain what's going on?

#include <stdio.h>
#define MAXLINE 1000 /* maximum input line length */
int getline(char line[], int maxline);
void copy(char to[], char from[]);

/* print the longest input line */
main()
{
    int len;                           /* current line length */
    int max;                          /* maximum length seen so far */
    char line[MAXLINE];          /* current input line */
    char longest[MAXLINE];     /* longest line saved here */

    max = 0;

    while ((len = getline(line, MAXLINE)) > 0)
           if (len > max) {
           max = len;
           copy(longest, line);
           }
    if (max > 0) /* there was a line */
           printf("%s", longest);

return 0;
}

/* getline: read a line into s, return length */
int getline(char s[],int lim)
{
    int c, i;

    for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
        s[i] = c;
    if (c == '\n') {
        s[i] = c;
        ++i;
    }
    s[i] = '\0';

return i;
}

/* copy: copy 'from' into 'to'; assume to is big enough */
void copy(char to[], char from[])
{
    int i;
    i = 0;

    while ((to[i] = from[i]) != '\0')
        ++i;
}
4

There are 4 answers

4
Dennis Meng On BEST ANSWER

This for loop appears to be doing the reading in getline:

for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
    s[i] = c;

It looks like i is incremented until it reaches lim - 1, not lim (where lim here is equal to MAXLINE in the case you were talking about). Hence, if the line is longer than MAXLINE, it stops after reading MAXLINE-1 characters, and tacks on the '\0' at the end like you expect.

0
umläute On

general answer

reading/writing outside of allocated memory is undefined behaviour.

In many cases it will lead to the dreaded Segmentation fault.

In some cases you might get away due to sheer luck (e.g. because the actual memory you have accessed is physically/logically existing and not used otherwise).

the simple answer is: do not do this!! protect your code against accessing out-of-bounds memory.

C does never do any magic, like allocating n+1 bytes when you really only asked you to allocate n bytes.

as for your specific example

for (i=0; i < lim-1 /* ... */ ; ++i)

this will not really increment i up to lim, as the condition makes sure that i is smaller than lim-1, so as soon as it reaches lim-1 (which is still a valid index within s[]) it will stop the for-loop..

0
Devolus On

If you look at this line, then you can see that it stops the loop two characters before the limit. i < lim -1

for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)

If the char was a \n it is appended, so the 0-Byte is exactly at the limit in this case, if the line is exactly one byte shorter then the limit (which is correct, because the 0-Byte is also included).

2
Jonathan Leffler On

No, I think it is clean.

Note that since the book was written, POSIX has standardized a getline() function with a completely different interface; this can cause some grief, but it is fixable by renaming the function from K&R.

The code is:

int getline(char s[],int lim)
{
    int c, i;

    for (i = 0; i < lim-1 && (c=getchar()) != EOF && c != '\n'; ++i)
        s[i] = c;
    if (c == '\n') {
        s[i] = c;
        ++i;
    }
    s[i] = '\0';

    return i;
}

Let's consider 2 cases:

  1. 998 characters followed by newline.
  2. 999 characters followed by newline.

In the first case, when the character before the newline is read, i is 997, which is less than 999 (lim-1), so the getchar() is executed, the character is neither EOF nor newline, and s[997] is assigned, and i is incremented to 998. Since i is still less than 999, the newline is read, and the loop is terminated. Because c is the newline, s[998] is given the newline and i is incremented to 999. Then the assignment s[i] = '\0'; writes to element 999, which is safe.

The analysis in the second case is similar. When the character before the newline is read, i is 998, which is less than 999, so getchar() is executed, the character is neither EOF nor newline, so s[998] is assigned, and i is incremented to 999. Since i is no longer less than 999, the loop exits without reading the newline; since c is not a newline, the body of the if after the loop is not executed; then the null is written to s[999], which is safe.

If EOF is detected before the newline (so the file doesn't end with a newline and technically isn't a text file according to the C standard), the loop is safely broken without overflowing the buffer.

Is there a case that isn't covered?

This is called testing the boundary conditions. It is important to test just below a limit (to make sure it works OK) and at the limit (to ensure it handles that OK). Most of the time, the algorithm doesn't need more than one test just below and one test at the limit; sometimes, if the algorithm handles several numbers either side of a limit (e.g. average of 3 cells), then you have to do more testing at the upper boundary. Lower boundary testing is also important — testing for 0, 1, 2, ... is very valuable.