Strsep with Multiple Delimiters: Strange result

1k views Asked by At

I am currently having some strange results when using strsep with multiple delimiters. My delimiters include the TAB character, the space character, as well as > and <.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main()
{
    char buffer[50];
    char *curr_str = NULL;
    const char delim[4] = "\t >";
    //const char delim[4] = "\t ><"; // This does not work
  
    snprintf(buffer, 50, "%s", "echo Hello");
  
    char *str_ptr = buffer;
  
    curr_str = strsep(&str_ptr, delim);
  
    if (curr_str != NULL)
        printf("%s\n", curr_str);

    curr_str = strsep(&str_ptr, delim);
    if (curr_str != NULL)
        printf("%s\n", curr_str);
    return (0);
}

This output is what I expect.

echo 
Hello

However, as soon as I add the '<' character for the delimiter, I get

cho

Somehow, the first character gets cut off. Is there a reason behind why this is occurring?

Thank you.

3

There are 3 answers

0
AudioBubble On BEST ANSWER

The second argument to strsep, delim is a null-terminated string (like all strings in C), so you have to leave space for the terminating character:

const char delim[5] = "\t ><"; // This does work
//const char delim[] = "\t ><"; // or this

If you don't end the string, it will go exploring memory past the array and find many new delimiting characters to use, which is what happened in your case.

0
ryyker On

"...the first character gets cut off. is there a reason behind why this is occurring?"

Yes, undefined behavior caused by a non-null terminated char array being used in a C string function.

If when populated const char delim[4] does not contain a null termination, it will be just a char array, but not a C string. It may or may not exhibit Strange behavior, but it will invoke undefined behavior if used with any of the C string functions (such as curr_str = strsep(&str_ptr,delim);).

const char delim[4];

Has room for 4 char.

"\t ><"  //contains exactly 4 char

can be conceptualized like this in memory:

|\t| |>|<|?|?|?|  // ? = unknown content, possibly no null termination
         ^end of owned memory

It should contain the following:

|\t| |>|<|\0|?|?|  // null termination  
            ^end of owned memory (5 char wide)

requiring more room in the declaration, for example one of the two following options:

const char delim[5] = "\t ><";

or

const char delim[] = "\t ><";
0
chqrlie On

const char delim[4] = "\t ><"; does not define a proper C string because there is no space for the null terminator. Hence any non zero bytes following delim in memory will be part of the delimiter string.

This is of course undefined behavior, and in your case the compiler may position delim just before buffer without any padding, effectively continuing the sequence of delimiter characters with all characters from the string "echo Hello". This causes the first call to strsep to return an empty string.

You can check on this Godbolt instance that it is indeed the case in 32-bit mode, but not in 64-bit mode (remove the -m32 compiler option).

This problem is easy to fix. You can either let the compiler determine the length of the delim array:

const char delim[] = "\t ><";

or you can use a pointer to a string constant:

const char *delim = "\t ><";