I want to convert a hexadecimal representation in a string to an unsigned char variable like this:
std::stringstream ss;
uint8_t x;
ss << "1f";
ss >> std::hex >> x; // result: x = 0x31 (=49 in decimal and ='1' as char)
Obviously, I assumed that the conversion would lead to x = 0x1f (=31 in decimal) as 0x1f is less than 0xff which is the maximum that could be stored in an unsigned char with 8 bit. What happended instead is that only the first 8 bit of my string were used in the conversion.
Can someone explain to me why exactly that happened and how to fix it?
std::uint8_t
is (typically, see below) an alias forunsigned char
, and the correspondingoperator>>
treats it as a character type rather than an integer type. Because of this, the character'1'
is read into x, and its ASCII value is 49. That the hexadecimal notation of the ASCII value of'1'
happens to be the decimal notation of the value you wanted to parse is coincidental; attempting to parse"1e"
or"10"
or"1xyz"
would still result inx == 49
.To work around this problem, parse into another integer type first, then narrow to 8 bits:
Pedantic Addendum (mostly of historical interest)
If we're being perfectly pedantic,
uint8_t
is an optional (!) implementation-defined unsigned integer type that is exactly 8 bits wide if it exists. C++ defers the definition to the C standard in [cstdint.syn]/2, and C99 defines in 7.18.1.1:The background for this is history. Once upon a time, there existed platforms on which a byte did not have 8 bits, such as a number of PDPs (to say nothing of decimal computers like the early UNIVACs1). These are rarely of interest to us today, but they were important when C was designed, and as a consequence certain assumptions that would perhaps be made if C were developed today are not made in the C standard.
On these platforms, 8-bit integer types could not always be easily provided, and
unsigned char
, being defined as exactly one byte wide, cannot at the same time be exactly 8 bits wide if a byte is not 8 bits wide. This, along with a few other things2, is why alluintN_t
types are optional, and also why none of them are tethered to specific integer types. The intent was to define types that offer a specific low-level behavior. If the implementation couldn't provide that behavior, at least it would error out rather than compile nonsense.So, being perfectly pedantic: If you use
uint8_t
at all, it is possible to write a conforming C++ implementation that rejects your code altogether. It is also possible to write a conforming implementation in whichuint8_t
is an integer type distinct fromunsigned char
, where the code in the question just works.In practice, however, you are unlikely to encounter such an implementation. All current C++ implementations of which I'm aware define
uint8_t
as an alias ofunsigned char
.31 And even that's not the depth of the rabbit hole, although I doubt the creators of C had the Setun (a Russian balanced-ternary computer) in mind.
2 not all those machines represented integers as two's complement, for example.
3If you know of one that doesn't, leave a comment and I'll make a note of it here. I suppose it's possible that there's a microcontroller toolkit out there that has reasons to deviate.