I am reading some code that implements a simple parser. A function named scan breaks up a line into tokens. scan has a static variable bp that is assigned the line to be tokenized. Following the assignment, the whitespace is skipped over. See below. What I don't understand is why the code does a bitwise-and of the character that bp points to with 0xff, i.e., what is the purpose of * bp & 0xff? How is this:
while (isspace(* bp & 0xff))
++ bp;
different from this:
while (isspace(* bp))
++ bp;
Here is the scan function:
static enum tokens scan (const char * buf)
/* return token = next input symbol */
{ static const char * bp;
while (isspace(* bp & 0xff))
++ bp;
..
}
From the C Standard (7.4 Character handling <ctype.h>)
In this call
the argument expression
*bphaving the typecharis converted to the typeintdue to the integer promotions.If the type
charbehaves as the typesigned charand the value of the expression*bpis negative then the value of the promoted expression of the typeintis also will be negative and can not be representable as a value of the typeunsigned char.This results in undefined behavior.
In this call
due to the bitwise operator & the result value of the expression
* bp & 0xffof the typeintcan be represented as a value of the typeunsigned char.So it is a trick used instead of writing a more clear code like
The function
isspaceis usually implemented such a way that it uses its argument of the typeintas an index in a table with 256 values (from 0 to 255). If the argument of the typeinthas a value that is greater than the maximum value 255 or a negative value (and is not equal to the value of the macro EOF) then the behavior of the function is undefined.