Arithmetic with unsigned variables in C

816 views Asked by At

How can an unsigned char, for example, take values from -128 to +127? From what I understand the most significant bit is used to represent the sign of the number, and the remaining bits of a char are used to represent the magnitude of the number. Now, the largest possible magnitude for 7 bits is 127, so shouldn't the range be from -127 to +127? How can -128 be a result?

Secondly, what is the bit-level logic behind the following behavior

#include <stdio.h>

int main()
{
    signed char x = 127;
    x += 1;
    printf("%i", x);
}

Output:

-128

As one can see, x becomes -128, but why? What is the arithmetic behind this behavior?

2

There are 2 answers

5
Bernardo Meurer On

This works based on something called Two's Complement. The idea here, is that given some binary number, it's two complement will be it's One Complement (flip all bits) plus one. We can see a simple example, let's find the two's complement of 13, which we can write as 0b01101. 01101 (flip) -> 10010 (+1) --> 10011

Now, although if we interpreted that as a binary number as usual we would read 19 in decimal, we must know that the number is written in Two's Complement in order to reverse the procedure and arrive at the previous number 13. So, from this we can see that we have represented things such that +13 = 01101 and -13 = 10011, notice that the positive number starts with a 0 and it's symmetric with a 1. This will be a constant when using this representation, positive numbers will always begin with a 0, and negative ones with a 1. Something else that is worth noting is that I prefixed a 0 to my original representation of 13, that will be needed in order to correctly represent it's two's complement. You can try going through the same example without doing that and verifying it's necessity.

Now, let's take a look at a few values represented like this,

╔══════╦════════════════╦════════════════════════╗
║ Bits ║ Unsigned Value ║ Two's Complement Value ║
╠══════╬════════════════╬════════════════════════╣
║ 011  ║ 3              ║ 3                      ║
╠══════╬════════════════╬════════════════════════╣
║ 010  ║ 2              ║ 2                      ║
╠══════╬════════════════╬════════════════════════╣
║ 001  ║ 1              ║ 1                      ║
╠══════╬════════════════╬════════════════════════╣
║ 000  ║ 0              ║ 0                      ║
╠══════╬════════════════╬════════════════════════╣
║ 111  ║ 7              ║ -1                     ║
╠══════╬════════════════╬════════════════════════╣
║ 110  ║ 6              ║ -2                     ║
╠══════╬════════════════╬════════════════════════╣
║ 101  ║ 5              ║ -3                     ║
╠══════╬════════════════╬════════════════════════╣
║ 100  ║ 4              ║ -4                     ║
╚══════╩════════════════╩════════════════════════╝

As you can see, it works just as we had previously intended, however you can now begin to understand how the "bug" you found happens. The upper limit for a 4-bit representation in Two's Complement is the decimal value 3. Let's see how we would reach -4 by simply adding 1 to it. 3 = 0b011 therefore 3+1 = 0b100, which as you can see from the table maps to -4 (as opposed to 4) on Two's Complement. Your case was this exact problem, but with more bits. Signed representation like this is circular, so overflowing on the top yields the bottom value. Let's look at your case

127 = 0b01111111
127 + 1 = 0b10000000

As you can see it starts with a 1, therefore it is negative (!) and if you solve the Two's Complement you will see it represents -128 (as the lower bound is always larger than the upper bound).

It's given that not every hardware will implement things in the same way, Intel, AMD, ARM and, as far as I know, all major architectures for general purpose CPUs use Two's complement in their ALUs, but there is hardware that uses other techniques for implementing signing of integers, so fundamentally the behavior you described is undefined. Another interesting thing to notice is that IEEE's standard for floating point arithmetic, implements an exponent bias based signed float.

Finally, since we are talking about C here, do note that undefined behavior can be optimized by the compiler, one great example of such optimizations is described in this blog post.

1
AnT stands with Russia On

In C the behavior of operator += is defined though equivalent combination of = and + operators. E.g. your x += 1 by definition stands for x = x + 1. Since x has narrow type signed char, it is promoted to int before any arithmetic begin. This means that subexpression x + 1 is evaluated in the domain of type int. After that the result (of type int) is converted back to signed char and stored back into x.

Thus, in your case your x += 1 is actually equivalent to

x = (signed char) ((int) x + 1);

The (int) x + 1 subexpression does not overflow. It successfully produces value 128 of type int. However, this value does not fit into the range of signed char type, which leads to implementation-defined behavior when this value is converted back to signed char type. On your platform this implementation-defined behavior produces the value -128 of type signed char.