Secure Coding with getchar() and Whitespace Padding

525 views Asked by At

This chunk of code is part of a larger project. I tell you this because it is relevant. I'm working on a small (codewise) database to keep track of all my photos. I'm designing it so that I can later extend the program to store the citations in all my papers.

For better or for worse, I've settled on a simple flat file system with fixed width fields. That may not be the most efficient use of memory, and the files sizes get fluffed up, but it has some super advantages: No delimiters so the files easily remain human readable with nice formatting, and easy array access using simple array indexing.

I like the last part particularly because it lays within my skills. One thing about this database is that I need to be able to make a lot of entries FAAAA…ST! The bottleneck emerges in the user input part, not memory, and not disk space, so I'm concentrating on making the input wicked fast and easy.

I've learned that getting user input in C can be a tricky thing. C's string handling library has a lot of functions that can be exploited to generate buffer overflow. So… For this code, I implemented some advice from Robert C. Seacord's Secure Coding in C and C++ Specifically the chapter, "Strings and Buffer Overflows."

Here's a link: http://www.informit.com/articles/article.aspx?p=2036582&seqNum=5

Seacord suggests that using fgets() to process input lines, while possible to do securely, has performance limitations. (I like fast. Don't you?) He further suggests using getchar() instead. That's what I did.

Here's his suggestion for the use of a while loop for using getchar() securely:

while(( (ch = getchar()) != \n) && ch != EOF)

In my code below, I tweak that a touch.

There are several things I need the I/O to do. First, if the input entered is too long, I WANT it to truncate. Even though I'm maybe the only user, I might make a mistake. Second, if the input is shorter than the field width—which is mainly the case—I want to pad that field with spaces to the right.

That's the problem I'm having. More on that in a bit.

This whitespace keeps the flatfile clean looking, and, again, makes indexing extremely easy. I need to just hit the "Enter" key and move on to the next entry, sequentially, knowing the computer has formatted the data just the way I want it.

(This is actually a crude implementation E.F. Codd's principle that the data must be shielded from direct access. Everything must be checked, polished, parsed, etc., before it gets into storage. This prevents data corruption.)

In this code, I've cut all that last out because it's just noise, and I get pretty sick of trying to read other people's code who post their whole damn program with all sorts of extraneous stuff instead of just the part that's giving them trouble. At the same time, I like to post a complete program that can just be selected, copied, pasted, saved, and compiled. So I've done that here. I've left in the comments and my little checks which I'll uncomment to make sure everything is as it should be, then comment out again, so this is not the simplest possible code, but what's that famous Einstein quote about too simple?

Anyhow, I know all that is a bit long, but I wanted to sketch out the design principles I was using. You might have a useful critique. I sure as hell could be taking a dumb approach.

The exact problem I have is just how to pad those fields with exactly the right amount of whitespace.

I did figure out a solution. Sort of.

But it seems like a hack to me.

I just suspect there is a more efficient, fast, or elegant way to accomplish what I'm trying to do. The "hack" part is calling a second print function and using the difference between the character count and the maxlength constant to add whitespace after the data. See lines 27-39 for that. It works… But?

Shouldn't I just be able to pad the array directly?

Can't figure it out!

Here's the code:

#include <stdio.h>
#include <stdlib.h>

/** array-ops/5.c
    Template for inputting data from stdin using getchar().
        Sets arraymax to prevent overflow, truncates array if smaller than 
    arraymax, right pads short entries with spaces, and quits gracefully. 
    (Really? You're /sure/ about that last?) */

#define ARRAYMAX 8

int main(int argc, char *argv[])
{
    int ch;
    int count;
    char array[ARRAYMAX];

    ch = getchar();
    count = 0;
    // no overflows, unclosed processes, or extra keystrokes needed
    while(count < ARRAYMAX && ch != '\n' && ch != EOF) {
        array[count++] = ch;

        ch = getchar();
    }

    int diff = (ARRAYMAX - count);
    //printf("count: %d\n", count); // check
    //printf("diff: %d\n", diff); // check again. off-by-one?

    int i;
    for(i = 0; i < count; i++) {
        printf("%c", array[i]);
    }

    int j;
    for(j = 0; j < diff; j++) {
        printf("%s", " ");
    }

    //printf("|\n"); // check, spaces really there?
    printf("\n");

    return 0;
}

BTW, I really searched for the answer to this before posting. Feel free to knock me over with a feather, but it seemed that everyone was trying to solve a slightly different problem, especially a cavalier indifference to data protection and buffer overflows. So I don't think this is a duplicate question.

[edit] Here is the revised code. It incorporates Joachim's solution and an if-else loop to isolate a truncated string. It's still not the best, but...

#include <stdio.h>
#include <stdlib.h>

/** array-ops/5r.c
    Template for inputting data from stdin using getchar().
        Sets arraymax to prevent overflow, truncates array if smaller than 
    arraymax, right pads short entries with spaces, and quits gracefully. */

#define ARRAYMAX 8

int main(int argc, char *argv[])
{
    int ch;
    int count;
    char array[ARRAYMAX];

    ch = getchar();
    count = 0;
    // no overflows, unclosed processes, or extra keystrokes needed
    while(count < ARRAYMAX && ch != '\n' && ch != EOF) {
        array[count++] = ch;

        ch = getchar();
    }

    int diff = (ARRAYMAX - count);
    printf("count: %d\n", count); // check
    printf("diff: %d\n", diff); // check again for off-by-one

    if(count == ARRAYMAX) {
        printf("%.*s", ARRAYMAX, array);
    } else {
        printf("%.*s%*c", count, array, diff, ' ');
    }

    printf("|--array ends there\n"); // check, spaces really there?
    //printf("\n");

    return 0;
}
2

There are 2 answers

8
Some programmer dude On BEST ANSWER

If you see e.g. this printf reference, you will might notice the * format modifier, that can be used to set the field width or precision through an argument.

Can be used such as

printf("%.*s%*c\n", count, array, diff, ' ');

This will print count characters from array, then print one space right-justified by diff characters, and end it with a newline.

For an input of e.g. "ab\n", it should print "ab " followed by a newline.

8
David C. Rankin On

In your case, the easiest way would be to initialize array to all spaces (hex 0x20). Then, no matter what the input, your array is always space-padded to 8-chars. This works regardless of the length of the input, and without having to worry about the number of characters input, or calculating the length of the pad. Further, it does not rely on characters being input. If the user (you), just hit [enter], you still get your 8-char padded array.:

#include <stdio.h>
#include <stdlib.h>

#define ARRAYMAX 8

int main()
{
    int ch;
    int count = 0;
    char array[ARRAYMAX] = { 0x20,0x20,0x20,0x20,0x20,0x20,0x20,0x20 };
    int i = 0;

    while((ch = getchar()) && count < ARRAYMAX && ch != '\n' && ch != EOF)
        array[count++] = ch;

    printf("\n         01234567\n");

    printf(" array: '");
    for (i = 0; i < ARRAYMAX; i++)
        printf("%c", array[i]);

    printf ("'\n\n");

    return 0;
}

output:

$ ./bin/padstr8
Hi

         01234567
 array: 'Hi      '

$ ./bin/padstr8
It's all Good

         01234567
 array: 'It's all'

Note: if you are using gcc, you can initialize all elements of array with:

char array[ARRAYMAX] = { [0 ... 7] = 0x20 }; 

or more readable:

char array[ARRAYMAX] = { [0 ... 7] = ' ' };

That cuts down initializing every element.


This exercise started me off on a tangent to see what type of function I could write that would take a char array, whether allocated (on stack or heap), or not at all and simply passed as NULL pointer, and an accompanying field size that would allow me to fill a fixed field size and make that available in main(). I've tinkered a bit, and hit on something that if I don't find any more hidden gotchas, seems like it is doing a decent job. It is passed along for food for thought:

#include <stdio.h>
#include <stdlib.h>

#define ARRAYMAX 8

/** fill 'str' with input from stdin rt-padded to 'szfld'.
*  if 'str' exists and has storage, 'str' is filled with
*  input up to a maximum size of 'szfld' chars. If input
*  is less than 'szfld', 'str' is rt-padded with spaces.
*  if 'str' is 'NULL', 'szfld' chars are allocated.
*  NOTE: 'str' is NOT null-terminated. (intentionally)
*/
char *fillfield (char *str, int szfld)
{
    char ch = 0;
    int  count = 0;
    int  i = 0;

    szfld = (szfld > ARRAYMAX) ? ARRAYMAX : szfld;

    if (str)
        for (i = 0; i < szfld; i++)
            str[i] = 0;
    else
        str = calloc (szfld, sizeof (char));

    printf ("\n Input: ");
    while((ch = getchar()) && count < ARRAYMAX && ch != '\n' && ch != EOF)
        str[count++] = ch;

    if (count >= ARRAYMAX && ch != '\n')
        while ((ch = getchar()) != '\n' && ch != EOF) ;

    char *p = str + szfld - 1;
    while (!*p && p >= str) *p-- = 0x20;

    return str;    
}

int main() {

    char field_1[6];
    fillfield (field_1, 6);                 /* fill existing array */

    char *field_2 = fillfield (NULL, 6);    /* allocate/fill array */

    printf ("\n field_1: '%.*s'\n", 6, field_1);
    printf (" field_2: '%.*s'\n\n", 6, field_2);

    if (field_2) free (field_2);            /* clean up allocation  */

    return 0;
}

output:

$ ./bin/padstrf

 Input: hi

 Input: hi

 field_1: 'hi    '
 field_2: 'hi    '

$ ./bin/padstrf

 Input:  hi

 Input:  hi

 field_1: ' hi   '
 field_2: ' hi   '

$ ./bin/padstrf

 Input: truncate

 Input: truncate

 field_1: 'trunca'
 field_2: 'trunca'

$ ./bin/padstrf

 Input:

 Input:

 field_1: '      '
 field_2: '      '