Remove whitespace using strsep()?

86 views Asked by At

I'm trying to read in command lines such as "ls -la /tmp" and need to split them into a char pointer array. I read in an input using getline() and pass the input and pointer array to the function and "i" return the number of commands/args so in this case 3. I want to be able to ignore/remove any whitespace and not seg fault on empty lines, like a line full of tabs, so each array index would point to "ls", "-la", "/tmp" respectively. Any help greatly appreciated.

int string_split(char * input, char * commands[]) {
    int i = 0;
    const char * delim = " \t\r\n\a";
    //Remove trailing newline character
    input[strcspn(input, "\n")] = '\0';
    //Break user input into each command using fixed delimiter
    while((commands[i] = strsep(&input, delim)) != NULL) {
        //Each break counted then returned to reference number of command/arguments
        i++;
    }
    return i;
}

Prefer to use strsep over strtok/strtok_r, I've tried both and run into seg faults constantly. Starting code prior to calling function:

char * input = NULL;
size_t input_size = 0;
char * command_line[10];

int res = getline(&input, &input_size, stdin);

if(res == -1) {
    exit(0);
}
//String contains only newline character
else if(res == 1) {
    continue;
}
else {
//Break string into seperate command and arguments then stores number of commands/arguments
     int input_length = string_split(input, command_line);
//Code continues on to handling...
1

There are 1 answers

0
gulpr On

You can easily create such function yourself.

/**
 * @brief Splits a string into substrings using a delimiter.
 *
 * This function splits the input string `str` into substrings based on the provided
 * delimiter string `delim`, and stores pointers to these substrings in the `argv`
 * array. The maximum number of substrings to store is specified by `argvsize`.
 *
 * @param argv [out] An array of pointers to store the substrings.
 * @param argvsize [in] The size of the `argv` array.
 * @param str [in,out] The input string to split. Will be modified during processing.
 * @param delim [in] The delimiter string used to split the input string.
 * @return The number of substrings stored in the `argv` array.
 *
 * @details This function modifies the input string by replacing delimiter characters
 * with null terminators to separate substrings. The `argv` array is populated with
 * pointers to the start of each substring. If there are more substrings than the
 * size of the `argv` array, only the first `argvsize` substrings will be stored.
 *
 * @note The input string `str` will be modified during processing.
 *
 * \code{c}
 * char input[] = "apple,orange,banana,grape";
 * char *substrings[4];
 * size_t num_substrings = splitstring(substrings, 4, input, ",");
 * for (size_t i = 0; i < num_substrings; ++i) {
 *     printf("Substring %zu: %s", i, substrings[i]);
 * }
 *
 * // Output:
 * // Substring 0: apple
 * // Substring 1: orange
 * // Substring 2: banana
 * // Substring 3: grape
 * \endcode
 *
 * @see strchr
 */
size_t splitstring(char **argv, size_t argvsize, char *str, const char *delim)
{
    size_t pos = 0;
    if(argv && argvsize && str && *str)
    {
        memset(argv, 0, argvsize * sizeof(*argv));
        argv[pos++] = str;
        while(*str)
        {
            if(strchr(delim, *str))
            {
                *str++ = 0;
                argv[pos++] = str;
                if(pos >= argvsize) break;
            }
            else
            {
                str++;
            }
        }
    }
    return pos;
}

To do not count empty strings add if(strchr(delim, *str)) continue; after `*str++ = 0;

usage:

int main(void)
{
    char str[] = "ls -la /tmp";
    char *arr[5];

    size_t size = splitstring(arr, sizeof(arr) / sizeof(arr[0]), str, " ");
    for(size_t i = 0; i < size; i++)
    {
        printf("Command[%zu] = '%s'\n", i, arr[i]);
    }
}

https://godbolt.org/z/soPafqfrd

Result:

Command[0] = 'ls'
Command[1] = '-la'
Command[2] = '/tmp'