C-string alternatives to strtok_r() and strsep() that don't modify the original string pointer?

1.1k views Asked by At

I was taking a look at the 2 C-string functions, strtok_r() and strsep(), and noticed both functions modify the location of the original string passed in.

Are there any other C-string functions that don't modify the original string passed in?

In my application, the original string is dynamically allocated, so I wish to free the original string after the parsing is done.

An example with strtok_r()

int main(){
    char * str = strdup("Tutorial and example");
    char* token;
    char* rest = str;
    
    printf("%s\n", rest);
    while ((token = strtok_r(rest, " ", &rest)))
        printf("%s\n", token);
    printf("\n%s\n",str);
    return(0);
}

Output

Tutorial and example                                                                                                                                                        
Tutorial                                                                                                                                                                    
and                                                                                                                                                                         
example                                                                                                                                                                     
                                                                                                                                                                            
                                                                                                                                                                            
                                                                                                                                                                            
Tutorial                                                                                                                                                                          

In the very last line, I wish for str to point to the unmodified cstring "Tutorial and example".

A similar output would have occured with strsep() as well.

int main(){
    char * str = strdup("Tutorial and example");
    char* token;
    char* rest = str;

    printf("%s\n", rest); 
    while ((token = strsep(&rest, " ")))
        printf("%s\n", token);
    if (rest != NULL)
        printf("%s\n", rest);
        
    printf("%s\n", str); 
    return(0);
}

Thank you.

2

There are 2 answers

1
SergeyA On BEST ANSWER

I think you are misunderstanding strtok_r. It does not change the location of the original string, moreover, it can not - the function can not change the value of the pointer passed into it and make this change visible to the calling code.

What it can and will do is modifying the contents of the string itself, by replacing tokens with nul-terminators. So to answer your original question:

In my application, the original string is dynamically allocated, so I wish to free the original string after the parsing is done.

You do not have to do anything special. You can and should free original string after you are done with it.

You are seeing a single word Tutorial printed simply because the next character was replaced with nul-terminator and printf stop there. If you are to inspect the string character by character, you will see that it otherwise have remained intact.

0
Vlad from Moscow On

Though the mentioned string functions change the original string nevertheless the pointer str points to the dynamically allocated memory and you may use it to free the allocated memory.

if you do not want to change the original string you can use standard C string functions strspn and strcspn.

For example

#include <stdio.h>
#include <string.h>

int main(void) 
{
    const char *s = "Tutorial and example";
    const char *separator = " \t";
    
    puts( s );
    
    for ( const char *p = s; *p; )
    {
        p += strspn( p, separator );
        
        const char *prev = p;
        
        p += strcspn( p, separator );
        
        int width = p - prev;
        
        if ( width ) printf( "%.*s\n", width, prev );
    }
    
    return 0;
}

The program output is

Tutorial and example
Tutorial
and
example

Using this approach you can dynamically allocate memory for each extracted substring.

For example

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) 
{
    const char *s = "Tutorial and example";
    const char *separator = " \t";
    
    puts( s );
    
    size_t n = 0;
    char **a = NULL;
    int success = 1;
    
    for ( const char *p = s; success && *p; )
    {
        p += strspn( p, separator );
        
        const char *prev = p;
        
        p += strcspn( p, separator );
        
        if ( p - prev != 0 )
        {
            char *t = malloc( p - prev + 1 );
            
            if ( ( success = t != NULL ) )
            {
                t[p - prev] = '\0';
                memcpy( t, prev, p - prev );
            
                char **tmp = realloc( a, ( n + 1 ) * sizeof( char * ) );
                
                if ( ( success = tmp != NULL ) )
                {
                    a = tmp;
                    a[n++] = t;
                }
                else
                {
                    free( t );
                }
            }
        }
    }
    
    for ( size_t i = 0; i < n; i++)
    {
        puts( a[i] );
    }

    for ( size_t i = 0; i < n; i++)
    {
        free( a[i] );
    }
    
    free( a );
    
    return 0;
}

The program output is the same as shown above.

Tutorial and example
Tutorial
and
example