How can an unsigned char, for example, take values from -128
to +127
? From what I understand the most significant bit is used to represent the sign of the number, and the remaining bits of a char are used to represent the magnitude of the number. Now, the largest possible magnitude for 7 bits is 127
, so shouldn't the range be from -127
to +127
? How can -128
be a result?
Secondly, what is the bit-level logic behind the following behavior
#include <stdio.h>
int main()
{
signed char x = 127;
x += 1;
printf("%i", x);
}
Output:
-128
As one can see, x
becomes -128
, but why? What is the arithmetic behind this behavior?
This works based on something called Two's Complement. The idea here, is that given some binary number, it's two complement will be it's One Complement (flip all bits) plus one. We can see a simple example, let's find the two's complement of
13
, which we can write as0b01101
.01101 (flip) -> 10010 (+1) --> 10011
Now, although if we interpreted that as a binary number as usual we would read
19
in decimal, we must know that the number is written in Two's Complement in order to reverse the procedure and arrive at the previous number13
. So, from this we can see that we have represented things such that+13 = 01101
and-13 = 10011
, notice that the positive number starts with a0
and it's symmetric with a1
. This will be a constant when using this representation, positive numbers will always begin with a0
, and negative ones with a1
. Something else that is worth noting is that I prefixed a0
to my original representation of13
, that will be needed in order to correctly represent it's two's complement. You can try going through the same example without doing that and verifying it's necessity.Now, let's take a look at a few values represented like this,
As you can see, it works just as we had previously intended, however you can now begin to understand how the "bug" you found happens. The upper limit for a 4-bit representation in Two's Complement is the decimal value
3
. Let's see how we would reach-4
by simply adding1
to it.3 = 0b011
therefore3+1 = 0b100
, which as you can see from the table maps to-4
(as opposed to4
) on Two's Complement. Your case was this exact problem, but with more bits. Signed representation like this is circular, so overflowing on the top yields the bottom value. Let's look at your caseAs you can see it starts with a
1
, therefore it is negative (!) and if you solve the Two's Complement you will see it represents -128 (as the lower bound is always larger than the upper bound).It's given that not every hardware will implement things in the same way, Intel, AMD, ARM and, as far as I know, all major architectures for general purpose CPUs use Two's complement in their ALUs, but there is hardware that uses other techniques for implementing signing of integers, so fundamentally the behavior you described is undefined. Another interesting thing to notice is that IEEE's standard for floating point arithmetic, implements an exponent bias based signed float.
Finally, since we are talking about C here, do note that undefined behavior can be optimized by the compiler, one great example of such optimizations is described in this blog post.