Issues about the signedness of char

437 views Asked by At

According to the standard, whether char is signed or not is implementation-defined. This has caused me some trouble. Following are some examples:

1) Testing the most significant bit. If char is signed, I could simply compare the value against 0. If unsigned, I compare the value against 128 instead. Neither of the two simple methods is generic and applies to both cases. In order to write portable code, it seems that I have to manipulate the bits directly, which is not neat.

2) Value assignment. Sometimes, I need to write a bit pattern to a char value. If char is unsigned, this can be done easily using hexadecimal notation, e.g., char c = 0xff. But this method does not apply when char is signed. Take char c = 0xff for example. 0xff is beyond the the maximum value a signed char can hold. In such cases, the standard says the resulting value of c is implementation-defined.

So, does anybody have good ideas about the these two issues? With respect to the second one, I'm wondering whether char c = '\xff' is OK for both signed and unsigned char.

NOTE: It is sometimes needed to write explicit bit patterns to characters. See the example in http://en.cppreference.com/w/cpp/string/multibyte/mbsrtowcs.

5

There are 5 answers

4
Tony Delroy On BEST ANSWER

1) testing MSB: (x | 0x7F) != 0x7F (or reinterpret_cast<unsigned char&>(x) & 0x80)

2) reinterpret_cast<unsigned char&>(x) = 0xFF;

Note that reinterpret_cast is entirely appropriate if you want to treat the memory the character occupies as a collection of bits, bypassing the specific bit patterns associated with any given value in the char type.

4
Thomas On

Actually you can do what you want without worrying about signedness.

Hexadecimal describes bit pattern not the integral value. (see disclaimer)

So for 2. you said you can't assign bit patterns like this

char c = 0xff

but you realy can do that, signed or not.

For 1, you may not be able to do the "compare with 0" trick, but you stil have several ways to check the most significant bit. One way is, shift to the right 7, shifting in zero's on the left, and then check if it's equal to 1. Independent of signedness.

As Tony D pointed out, (x | 0x7F) != 0x7F is a more portable way of doing it instead of shifting because it may not shift in zeros. Similarily, you could do x & 0x80 == 0x80.

Of course you can also do what Brian suggested and just use an unsigned char.

Disclaimer: Tony pointed out that 0x is actually an int and the conversion to char is implementation defined when the char can't hold the value or if the char is unsigned. However, no implementation is going to break the standard here. char c = 0xFF, weather or unsigned or not, will fill the bits, trust me. It will be extremely difficult to find an implementation that doesn't do that.

0
Vul On

You can OR and AND the given value with the two 0x7F and 0xFF respectively to detect as well as remove its signed_ness.

0
user207421 On

If you really care about the signed-ness, just declare the variable as signed char or unsigned char as needed. No platform-independent bit-twiddling tricks required.

0
MSalters On

Easiest way to test the MSB is to make it the LSB: char c = foo(); if ((c>>(CHAR_BIT-1)) & 1) ....

Setting a specific bitpattern is a bit more tricky. All-bits-one for instance may not necessarily be 0xff but could also be 0x7ff, ore more realistically 0xffff. Regardless, ~char(0) is all-bits-one. Somewhat less obvious, so is char(-1). If char is signed, that's clear; if unsigned this is still correct because unsigned type work modulo 2^N. Following that logic, char(-128) sets just the 8 bit regardless of how many bits there are in the char or whether it's signed.