Convert hex string into unsigned char in C++

2k views Asked by At

I want to convert a hexadecimal representation in a string to an unsigned char variable like this:

std::stringstream ss;
uint8_t x;
ss << "1f";
ss >> std::hex >> x;  // result: x = 0x31 (=49 in decimal and ='1' as char)

Obviously, I assumed that the conversion would lead to x = 0x1f (=31 in decimal) as 0x1f is less than 0xff which is the maximum that could be stored in an unsigned char with 8 bit. What happended instead is that only the first 8 bit of my string were used in the conversion.

Can someone explain to me why exactly that happened and how to fix it?

1

There are 1 answers

2
Wintermute On BEST ANSWER

std::uint8_t is (typically, see below) an alias for unsigned char, and the corresponding operator>> treats it as a character type rather than an integer type. Because of this, the character '1' is read into x, and its ASCII value is 49. That the hexadecimal notation of the ASCII value of '1' happens to be the decimal notation of the value you wanted to parse is coincidental; attempting to parse "1e" or "10" or "1xyz" would still result in x == 49.

To work around this problem, parse into another integer type first, then narrow to 8 bits:

std::stringstream ss;
uint8_t x;
unsigned tmp;

ss << "1f";
ss >> std::hex >> tmp; 
x = tmp;                // may need static_cast<uint8_t>(tmp) to suppress
                        // compiler warnings.

Pedantic Addendum (mostly of historical interest)

If we're being perfectly pedantic, uint8_t is an optional (!) implementation-defined unsigned integer type that is exactly 8 bits wide if it exists. C++ defers the definition to the C standard in [cstdint.syn]/2, and C99 defines in 7.18.1.1:

1 The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two's complement representation. Thus, int8_t denotes a signed integer type with a width of exactly 8 bits.

2 The typedef name uintN_t designates an unsigned integer type with width N. Thus, uint24_t denotes an unsigned integer type with a width of exactly 24 bits.

3 These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, it shall define the corresponding typedef names.

The background for this is history. Once upon a time, there existed platforms on which a byte did not have 8 bits, such as a number of PDPs (to say nothing of decimal computers like the early UNIVACs1). These are rarely of interest to us today, but they were important when C was designed, and as a consequence certain assumptions that would perhaps be made if C were developed today are not made in the C standard.

On these platforms, 8-bit integer types could not always be easily provided, and unsigned char, being defined as exactly one byte wide, cannot at the same time be exactly 8 bits wide if a byte is not 8 bits wide. This, along with a few other things2, is why all uintN_t types are optional, and also why none of them are tethered to specific integer types. The intent was to define types that offer a specific low-level behavior. If the implementation couldn't provide that behavior, at least it would error out rather than compile nonsense.

So, being perfectly pedantic: If you use uint8_t at all, it is possible to write a conforming C++ implementation that rejects your code altogether. It is also possible to write a conforming implementation in which uint8_t is an integer type distinct from unsigned char, where the code in the question just works.

In practice, however, you are unlikely to encounter such an implementation. All current C++ implementations of which I'm aware define uint8_t as an alias of unsigned char.3

1 And even that's not the depth of the rabbit hole, although I doubt the creators of C had the Setun (a Russian balanced-ternary computer) in mind.

2 not all those machines represented integers as two's complement, for example.

3If you know of one that doesn't, leave a comment and I'll make a note of it here. I suppose it's possible that there's a microcontroller toolkit out there that has reasons to deviate.