basename on buffer goes into segmentation fault

823 views Asked by At

I'm tweaking with basename right now and I encounter a case really weird (at least for me). Here's the code:

char buffer[300];
char* p;

strcpy(buffer, "../src/test/resources/constraints_0020_000");
printf("%d\n", strcmp(basename("../src/test/resources/constraints_0020_000"), "constraints_0020_000")); //works as expected
printf("assert testBasename02");
printf("%d\n", strcmp(basename(buffer), "constraints_0020_000") == 0);
printf("done 1\n"); //goes in segmentation fault
printf("%d\n", strcmp(basename(&buffer), "constraints_0020_000") == 0);
printf("done 2\n"); //goes in segmentation fault
printf("%d\n", strcmp(basename(&buffer[0]), "constraints_0020_000") == 0);
printf("done 3\n"); //goes in segmentation fault
p = malloc(strlen("../src/test/resources/constraints_0020_000") +1);
strcpy(p, "../src/test/resources/constraints_0020_000");
printf("%d\n", strcmp(basename(p), "constraints_0020_000") == 0); //works as expected
free(p);
printf("all done\n");

The first strcmp works totally as excepted; it is the second one that puzzles me: why a buffer would go in segmentation fault? I tried to code the buffer all in different ways but the result is the same.

I can of course live with this behaviour but... I don't really understand what is the difference for basename if i feed him a const char* or a buffer (that in the end is also a char*).

Is there a document that explain this behaviour? Is it just me? I tried to look for explanations but I couldn't find any.

Here the specification of my computer (if you need them):

  • OS system: Ubuntu 16.4 (64 bit virtualized on Windows 10 64-bit);
  • CPU (not that I think is useful): Intel® Core™ i5-3230M CPU @ 2.60GHz × 2;
2

There are 2 answers

7
Sourav Ghosh On BEST ANSWER

According to the man page,

Bugs

In the glibc implementation of the POSIX versions of these functions they modify their argument, and segfault when called with a static string like "/usr/". [...]

Basically,

 basename("../src/test/resources/constraints_0020_000")

invokes invokes undefined behavior as this is an attempt to modify the string literal.


Note: As mentioned in the man page, there's a change of words needed. Read it like,

In the glibc implementation of the POSIX versions of these functions they modify their argument, and invokes undefined behavior when called with a static string like "/usr/". [...]

A segmentation fault is one of the side effects of UB, but not the only one.

FWIW, attempt to modify a string literal itself invokes the UB. Quoting C11, chapter §6.4.5, String literals

[...] If the program attempts to modify such an array, the behavior is undefined.


EDIT:

As discussed in follow up comments, an additional problem was missing header file. You need to have

  #include <libgen.h>

added so as to get the forward declaration of the function basename() available.

0
Andrew Henle On

Per the POSIX standard:

The basename() function may modify the string pointed to by path, and may return a pointer to internal storage. The returned pointer might be invalidated or the storage might be overwritten by a subsequent call to basename(). The returned pointer might also be invalidated if the calling thread is terminated.

Per the Linux man page:

Both dirname() and basename() may modify the contents of path, so it may be desirable to pass a copy when calling one of these functions.

You're calling basename() with a static string, which is likely read-only, thus causing a SEGV when basename() attempts to modify the string.