Replacing spaces for TAB with (or without) arrays

179 views Asked by At

I tried to follow some advice and started reading C programming language book. In the book there are several exercise and end of each chapter.

I've just did the following exercise:

Write a program entab that replaces strings of blanks by the minimum number of tabs and blanks to achieve the same spacing. When either a tab or single blank would suffice to reach a stop tab, which would be given preference?

My solution to this problem was the following:


#include <stdio.h>
#include <stdlib.h>

int get_string(char string[], int index)
{
    for(int position = 0; position < index; position++)
    {
        string[position] = getchar();
        if(string[position] == '\n')
        {
            string[position] = '\0';
            return position;
        }
    }
    return index;
}

int get_spaces(char string[], int i)
{
    int temp_i = 1;
    while(string[i+1] == ' ')
    {
        temp_i++;
        i++;
    }
    return temp_i;
}

int get_tabs(int spaces)
{
    int tabs = 0;
    tabs = spaces / 4;
    return tabs;
}

void entab(char string[], int max_index)
{
    int i = 0;
    int spaces = 0;
    int tabs = 0;
    while(i <= max_index)
    {
        if(string[i] == '\t')
        {
            printf("\t");
            i++;
        }
        else if(string[i] == ' ')
        {
            spaces = get_spaces(string, i);
            i = i + spaces;
            if(spaces == 4)
            {
                printf("\t");
                spaces = 0;
            }
            else if (spaces < 4)
            {
                for(int i = 0; i < spaces; i++)
                {
                    putchar(' ');
                }
                spaces = 0;
            }
            else
            {
                tabs = get_tabs(spaces);
               // printf("%d", tabs);
                spaces = spaces - (tabs*4);
               // printf("%d", spaces);
                for(int i = 0; i < tabs; i++)
                    printf("\t");
                for(int i = 0; i < spaces; i++)
                    printf(" ");
                tabs = 0;
                spaces = 0;
            }
        }
        else
        {
            putchar(string[i]);
            i++;
        }
    }
}
int main()
{

    char string[100] = "";
    int howlong = get_string(string, 100);

    entab(string, howlong);

    return 0;
}

2 questions:

  1. How is it possible that to reach a tab stop, a tab might be necessary (cit. When either a tab or single blank would suffice to reach a stop tab, which would be given preference?)? I mean to reach it you need a space, so you can't prefer one or the other, it 2 different cases. Or not?

  2. I presume is not possible to do it directly of the 1st while loop without storing it in an array? or is it?

I'm pretty new to C programming, and programming in general so don't over estimate my capacities, I presume you can see that I'm not good at it, I'm trying to learn.

I dont know how to shrink the program down. It seems like an easy task yet I dont know how to address the problem.

and I think it doesn't work properly.

EDIT:

I HAVE REWROTE EVERYTHING. IM SURE ITS WORKING BUT DONT KNOW WHY IT PRINTS MORE SPACES, WHERE ARE THIS SPACES COMING FROM

4

There are 4 answers

4
ikegami On

When either a tab or single blank would suffice to reach a stop tab, which would be given preference ?

The point is to reduce the size of the output. Since spaces and tabs are the same size, it doesn't matter. So use whichever you want.

I presume is not possible to do it directly of the 1st while loop without storing it in an array ? or is it ?

No array is necessary. It's just a question of tracking three variables:

  • The current position in the line.
  • Whether we are currently in whitespace.
  • At which position the current span of whitespace started.

Algorithm

(Supports space, tab and LF as control characters.)

  1. Initialize the current position to 1.
  2. Initialize the in whitespace flag to 0.
  3. Create the whitespace start position variable.
  4. Loop,
    1. Read a character.

    2. If EOF has been reached,

      1. Break out of loop.
    3. If the character is a LF,

      1. Set the in whitespace flag to 0.
      2. Set the current position to 1.
    4. Else,

      1. If the character is a space,
        1. If the in whitespace flag is false,
          1. Set the in whitespace flag to 1.
          2. Set the whitespace start to the current position.
        2. Add one to the current position.
      2. Else,
        1. the character is a tab,
          1. If the in whitespace flag is false,
            1. Set the in whitespace flag to 1.
            2. Set the whitespace start to the current position.
          2. Add the appropriate amount to the current position.
        2. Else,
          1. If the in whitespace flag is true,
            1. Calculate the appropriate amount of tabs and spaces to output based on the current position and the whitespace start position.
            2. Output them.
            3. Set the in whitespace flag to false.
            4. Output the character.
            5. Add one to the current position.
5
Joël Hecht On
  1. How is it possible that to reach a tab stop, a tab might be necessary (cit. When either a tab or single blank would suffice to reach a stop tab, which would be given preference ?) ? i mean to reach it you need a space, so you cant prefere one or the other, it 2 different cases. Or not ?

Supposing tab width is 4, when in column 3, printing a space or a tab has the same result: reaching column 4. Since space and tab are one char only, printing one or the other is not really important in order to reduce the number of chars printed.

  1. I presume is not possible to do it directly of the 1st while loop without storing it in an array ? or is it ?

It is possible to replace spaces with tabs in one loop, without using an array. Here is the code:

#include <stdio.h>
#include <stdlib.h>

#define TAB_WIDTH 4
#define TAB_OR_SPACE '\t'

static void endtab(void) {
    char c;
    int nb_spaces;
    int column;

    column = 0;
    nb_spaces = 0;
    while ((c = getchar()) != EOF) {
        if (c == ' ') {
            column++;
            nb_spaces++;
            if (column % TAB_WIDTH == 0) {
                // A tab position is reached, so a tab can replace some spaces
                if (nb_spaces == 1) {
                    // Tab or space ? both works
                    c = TAB_OR_SPACE;
                } else {
                    c = '\t';
                }
                nb_spaces = 0;
            } else {
                // We could use a simple "continue" here,
                // but I find more elegant this way to skip the "putchar" to be consistant with the use of "if/else if" pattern
                c = 0;
            }
        } else if (c == '\t') {
            column = 0; // We don't need the exact column, we just need to be aligned on a tab stop
            nb_spaces = 0;
        } else {
            while (nb_spaces > 0) {
                putchar(' ');
                nb_spaces--;
            }
            if (c == '\r' || c == '\n') {
                column = 0;
            } else {
                column++;
            }
        }
        if (c != 0) {
            putchar(c);
        }
    }
    // Don't forget remaining spaces
    while (nb_spaces > 0) {
        putchar(' ');
        nb_spaces--;
    }
    putchar('\n');
}

int main(int argc, char **argv) {
    endtab();
    return EXIT_SUCCESS;
}

Corrected: some spaces may miss at end of line and multilines string.

10
Eric Postpischil On
#include <stdio.h>


#define TabWidth    4


int main(void)
{
    int CC = 0; //  Current physical output column is zero.
    int DC = 0; //  Current desired output column is zero.

    while (1)
    {
        //  Request next character.  If none, exit.
        int c = getchar();
        if (c == EOF)
            break;

        //  For a newline character, output it and reset for new line.
        else if (c == '\n')
        {
            putchar(c);
            CC = 0;
            DC = 0;
        }

        //  For a space, update desired column.
        else if (c == ' ')
            ++DC;

        /*  For a tab, update desired column to next tab stop.  We do this by
            "backing up" as many positions as we are beyond the current tab
            stop and then advancing a full tab width.
        */
        else if (c == '\t')
            DC = DC - DC % TabWidth + TabWidth;

        /*  For any other character, output suitable tabs and spaces and then
            the character.
        */
        else
        {
            /*  Output tabs until we reach the tab stop at or before the
                desired position.
            */
            while (CC/TabWidth < DC/TabWidth)
            {
                putchar('\t');
                CC = CC - CC % TabWidth + TabWidth;
            }

            //  Output spaces until we reach the desired position.
            while (CC < DC)
            {
                putchar(' ');
                ++CC;
            }

            //  Output the character.
            putchar(c);
            ++CC;
        }
    }
}
0
arfneto On

It is possible to not use an array. entab() is just a filter for input you can go with no array: all that matters are the tab stop size and the position on output. If you need to entab() a string then you can use 2 pointers.

As I wrote in a comment above, tab stops are positions in line. Better see them as column numbers. If tab width is 4 and first column is 0 then your tab stops are at columns 4,8,12,16,20.... If a string is X --- a.k.a. "\tX\n" in C notation --- the X will be displayed at column 4. If you use spaces and want a X in column 10 the string is of course " X" and uses 11 bytes. Your mission in entab is transform such a string in "\t\t X" that uses only 5 bytes and shows same thing on screen.

C Example

Suppose we have

int entab(
    const char* file_in, const char* file_out,
    const int tab);

But also have

int detab(
    const char* file_in, const char* file_out,
    const int tab);

So if we take the output of entab and feed to detab: it would be expected to get a file identical to the original file given to entab...

main.c for such a test

int main(void)
{
    const char* in     = "original20.c";
    const char* interm = "tab.txt";
    const char* out    = "out.c";

    const int tab_s = 4;

    int status = entab(in, interm, tab_s);
    printf(
        "\tentab(\"%s\",\"%s\",%d) returned %d\n", in,
        interm, tab_s, status);

    status = detab(interm, out, tab_s);
    printf(
        "\tdetab(\"%s\",\"%s\",%d) returned %d\n", interm,
        out, tab_s, status);

    return 0;
}

original20.c for the example

I will use the 1st 20 lines of the original code posted by the author:

#include <stdio.h>
#include <stdlib.h>
#define MAX_L                                              \
    1000  // max length of array is 999. -> [1000] is \0
#define TAB_W 4  // width of atab

void entab(void)
{
    char string[MAX_L];

    char c;
    int  i = 0;
    // get string
    while ((c = getchar()) != '\n')
    {
        string[i] = c;
        if (string[i] == '\n') { string[i + 1] = '\0'; }
        else { string[i] = c; }
        ++i;
    }

output of the test

       entab("original20.c","tab.txt",4) returned 0
       detab("tab.txt","out.c",4) returned 0
PS C:\SO-EN> cmd /c fc original20.c out.c
Comparing files original20.c and OUT.C
FC: no differences encountered

tab.txt is the output of entab

Original size was 475 bytes. tab.txt has 395.

#include <stdio.h>
#include <stdlib.h>
#define MAX_L                                              \
    1000  // max length of array is 999. -> [1000] is \0
#define TAB_W 4  // width of atab

void entab(void)
{
    char string[MAX_L];

    char c;
    int  i = 0;
    // get string
    while ((c = getchar()) != '\n')
    {
        string[i] = c;
        if (string[i] == '\n') { string[i + 1] = '\0'; }
        else { string[i] = c; }
        ++i;
    }

Complete C code

#define ENTER '\n'
#define TAB '\t'
#define SPACE 0x20
#include <stdio.h>
#include <stdlib.h>

int detab(
    const char* file_in, const char* file_out,
    const int tab);
int entab(
    const char* file_in, const char* file_out,
    const int tab);

int flush(char* const, size_t, FILE*);

int main(void)
{
    const char* in     = "original20.c";
    const char* interm = "tab.txt";
    const char* out    = "out.c";

    const int tab_s = 4;

    int status = entab(in, interm, tab_s);
    printf(
        "\tentab(\"%s\",\"%s\",%d) returned %d\n", in,
        interm, tab_s, status);

    status = detab(interm, out, tab_s);
    printf(
        "\tdetab(\"%s\",\"%s\",%d) returned %d\n", interm,
        out, tab_s, status);

    return 0;
}

int detab(
    const char* file_in, const char* file_out,
    const int tab_s)
{
    int    ch    = 0;
    size_t n_col = 0;
    FILE*  in    = fopen(file_in, "r");
    if (in == NULL) return -1;
    FILE* out = fopen(file_out, "w");
    if (in == NULL) return -2;

    while ((ch = fgetc(in)) >= 0)
    {
        switch (ch)
        {
            case TAB:
                for (int i = 0; i < tab_s; i += 1)
                {   // up to tab_s spaces
                    fprintf(out, " ");
                    n_col += 1;
                    if (n_col % tab_s == 0) break;
                }
                break;
            case ENTER:
                fprintf(out, "%c", ch);
                n_col = 0;
                break;
            default:
                fprintf(out, "%c", ch);
                n_col += 1;
                break;
        }
    }
    fclose(in);
    fclose(out);
    return 0;
}

int entab(
    const char* file_in, const char* file_out,
    const int tab_s)
{
    int    ch       = 0;
    size_t n_col    = 0;
    char   n_spaces = 0;

    if (tab_s < 2) return -1;
    FILE* in = fopen(file_in, "r");
    if (in == NULL) return -2;
    FILE* out = fopen(file_out, "w");
    if (in == NULL) return -3;
    char* buf = malloc(tab_s);
    if (buf == NULL) return -4;
    size_t ix = 0;

    while ((ch = fgetc(in)) >= 0)
    {
        switch (ch)
        {
            case ENTER:
                if (ix == 0)
                {
                    fprintf(out, "\n");
                    break;
                };
                // not empty
                flush(buf, ix, out);
                ix = 0;
                fprintf(out, "\n");
                break;
            default:
                *(buf + ix) = ch;
                ix += 1;
                if (ix == tab_s)
                {
                    flush(buf, ix, out);
                    ix = 0;
                }
                break;
        }
    };
    if (ix > 0)
    {
        flush(buf, ix, out);
        fprintf(out, "\n");
    }
    fclose(in);
    fclose(out);
    free(buf);
    return 0;
}

int flush(char* const buf, size_t ix, FILE* out)
{
    if (out == NULL) return -1;
    if (buf == NULL) return -2;
    char* rr = buf + ix - 1;
    for (char* p = rr ;p >= buf; p-=1)
    {
        if (*p != SPACE) break;
        *p = TAB;
        rr = p;
    };
    for (char*p = buf; p<=rr; p += 1)
        fprintf(out, "%c", *p);
    return 0;
}

This example uses a small array the size of a tab stop, for no special reason. Note that performance here is not an important issue, since we are using file I/O, a slow thing by nature.