How does "for ( ; *p; ++p) *p = tolower(*p);" work in c?

338 views Asked by At

I'm fairly new to programming and was just wondering by why this code:

for ( ; *p; ++p) *p = tolower(*p);

works to lower a string case in c, when p points to a string?

2

There are 2 answers

2
Bathsheba On BEST ANSWER

To unpick, let's assume p is a pointer to a char and just before the for loop, it points to the first character in a string.

In C, strings are typically modelled by a set of contiguous char values with a final 0 added at the end which acts as the null terminator.

*p will evaluate to 0 once the string null-terminator is reached. Then the for loop will exit. (The second expression in the for loop acts as the termination test).

++p advances to the next character in the string.

*p = tolower(*p) sets that character to lower case.

2
Cheers and hth. - Alf On

In general, this code:

for ( ; *p; ++p) *p = tolower(*p);

does not

works to lower a string case in c, when p points to a string?

It does work for pure ASCII, but since char usually is a signed type, and since tolower requires a non-negative argument (except the special value EOF), the piece will in general have Undefined Behavior.

To avoid that, cast the argument to unsigned char, like this:

for ( ; *p; ++p) *p = tolower( (unsigned char)*p );

Now it can work for single-byte encodings like Latin-1, provided you have set the correct locale via setlocale, e.g. setlocale( LC_ALL, "" );. However, note that very common UTF-8 encoding is not a single byte per character. To deal with UTF-8 text you can convert it to a wide string and lowercase that.


Details:

  • *p is an expression that denotes the object that p points to, presumably a char.

  • As a continuation condition for the for loop, any non-zero char value that *p denotes, has the effect of logical True, while the zero char value at the end of the string has the effect of logical False, ending the loop.

  • ++p advances the pointer to point to the next char.