I'm fairly new to programming and was just wondering by why this code:
for ( ; *p; ++p) *p = tolower(*p);
works to lower a string case in c, when p points to a string?
In general, this code:
for ( ; *p; ++p) *p = tolower(*p);
does not
” works to lower a string case in c, when p points to a string?
It does work for pure ASCII, but since char
usually is a signed type, and since tolower
requires a non-negative argument (except the special value EOF
), the piece will in general have Undefined Behavior.
To avoid that, cast the argument to unsigned char
, like this:
for ( ; *p; ++p) *p = tolower( (unsigned char)*p );
Now it can work for single-byte encodings like Latin-1, provided you have set the correct locale via setlocale
, e.g. setlocale( LC_ALL, "" );
. However, note that very common UTF-8 encoding is not a single byte per character. To deal with UTF-8 text you can convert it to a wide string and lowercase that.
Details:
*p
is an expression that denotes the object that p
points to, presumably a char
.
As a continuation condition for the for
loop, any non-zero char
value that *p
denotes, has the effect of logical True, while the zero char
value at the end of the string has the effect of logical False, ending the loop.
++p
advances the pointer to point to the next char
.
To unpick, let's assume
p
is a pointer to achar
and just before thefor
loop, it points to the first character in a string.In C, strings are typically modelled by a set of contiguous
char
values with a final 0 added at the end which acts as the null terminator.*p
will evaluate to 0 once the string null-terminator is reached. Then thefor
loop will exit. (The second expression in thefor
loop acts as the termination test).++p
advances to the next character in the string.*p = tolower(*p)
sets that character to lower case.