Why does the printf("%c", 0.21) results in 'ß'?

162 views Asked by At

So, here's my C code. While I was toying with my code in order to learn the format specifiers, especially %c:

#include <stdio.h>

void main()
{
    double a = 0.21;
    printf("%c", a);
}

I happened to encounter that even if I pass a floating value as an argument in printf() function for the format specifier %c, the compiler (my compiler is gcc) somehow converts the value 0.21 into decimal value 223 which corresponds to ASCII character ß.

For a = 0.22 the output is: ) whose decimal value in ASCII table is 29

I ran the code on both VS code and CLion IDEs, but the results were same. Now it is making me scratch my head for days and I can't figure it out.

I want to know how the values 0.21 and 0.22 are getting converted into the decimal 223 and 29 or how they correspond to the ASCII 223 and 29 i.e. ß and ) respectively.

Since the value 0.21, 0.22 does not corresponds to any of the ASCII, I was expecting the program to print nothing.

But based on the output I thought that this might have something to do with the binaries.
As 0.21 in binary is 0.00110101110000101000111101011100...
& 223 is 11011111
and 0.22 in binary is 0.00111000010100011110101110000101...
& 29 is 00011101

And I could not find any conversion pattern.

4

There are 4 answers

3
chqrlie On BEST ANSWER

Passing a float (which is automatically converted to a double in this case) for a %c has undefined behavior. printf looks for the argument where you program would have passed it if its type was int or a smaller type that promotes to int and uses that value. In your case the value happens to be 223 or 29 depending on circumstances and it could be something else on a different CPU, or after any unrelated change in the program or even just at a different time or place... The behavior is undefined, you could also get no output or a program crash (unlikely but not impossible).

Use compiler warnings to try and detect such problems (gcc -Wall -Wextra -Werror) and avoid scratching your head trying to make sense of undefined behavior.

2
Top-Master On

Passing a float or double for a %c has undefined behavior. period.

It wouldn't be "undefined behavior" if we could define "why" 0.21 results in 'ß'.

But we can assume that maybe your printf's implementation is binary-casting it to int, and that the resulting value did depending on CPU arch, result to 'ß'.

2
KamilCuk On

my compiler is gcc

Most probably you are working on X86-64 system. This architecture has separate floating point and normal registers. When printf tries to get the register for %c it reads a normal register, but , a sets a floating point register.

The value of %c is unrelated to a. Most probably, it is some leftover register value from main initialization, like _start or environ.

1
Eric Postpischil On

For a = 0.22 the output is: ) whose decimal value in ASCII table is 29

That is incorrect. The ASCII code for “)” is 41 in decimal. It is 29 in hexadecimal. And 41 (2916) is the low byte of the bits used to encode 0.22 in double, as shown below. Further, the character you are seeing for 0.21, “ß”, is encoded as 225 (E116) in some character encoding systems, and 225 is the low byte of the bits used to encode 0.21 in double.

As a simple test, you can set a to 0x1p52 + x, where x is any integer from 0 to 255, inclusive. Because of double numbers are encoded, 0x1p52 + x will produce an encoding with x in the low byte. For example, if this is what is happening on your system, using a = 0x1p52 + 88 will print “X”, the character with ASCII code 88.

Here is how the double resulting from 0.22 in source code is represented in the format most commonly used for double, IEEE-754 binary64:

  • 0.22 in binary is 0.00111000010100011110101110000101000111101011100001010001111010111000010100011110101110000101000111101011100001010001111010111…
  • We normalize it by separating an exponent factor, 2e, and a significand (the fraction portion of a floating-point representation), with the exponent selected so the significand is at least 1 but less than 2 (10 in binary):1 2−3•1.11000010100011110101110000101000111101011100001010001111010111000010100011110101110000101000111101011100001010001111010111…
  • The format uses 53-bit significands, so the significand is rounded to 53 bits. The first 53 bits of the significand are 1.1100001010001111010111000010100011110101110000101000. The next bits begin 111101011… Since these begin with 1 and contain further 1 bits, the trailing portion is more than ½ of the least significant of the 53 bits, so the number is rounded up. The significand that will be encoded is 1.1100001010001111010111000010100011110101110000101001.
  • The number is encoded using three fields: A bit to indicate the sign, 11 bits for the exponent, and 52 bits for the significand.
  • The sign bit is 0, to indicate “+”.
  • The exponent is encoded by adding 1023 and converting to 11 binary digits. −3 + 1023 = 1020, which is 01111111100 in binary.
  • The leading bit of the significand is known to be one from the exponent encoding (the reserved code of zero is used for subnormal significands). The remaining 52 bits of the significand are used for the significand field.
  • So the resulting bits are 0 01111111100 1100001010001111010111000010100011110101110000101001.
  • Formed into groups of eight bits, this is 00111111 11001100 00101000 11110101 11000010 10001111 01011100 00101001.
  • In decimal, that is 63, 204, 40, 245, 194, 143, 92, 41.

Observe the last byte is 41, matching the “)” you saw. One behavior that is seen when %c is incorrectly used is that the value from one byte of the passed argument is used. For example, the calling routine may pass the eight bytes used to encode a double number, but printf is expected just the four bytes of an int (%c takes an int value, of which it effectively prints one byte). So printf took four bytes from where it expected to find an int, converted those four bytes to unsigned char, and printed the resulting character. The C standard does not define this behavior, but it is a potential result.

For 0.21, the bytes of the encoded double are 63, 202, 225, 71, 174, 20, 122, 225. You reported 223, not 225, but I suspect your system is using a character encoding in which “ß” is 225, not 223. For example, code page 852, has “ß” in the E116 (225) spot.

It is a little unusual for the bytes for a double to be taken for an integer argument these days. It essentially requires passing arguments on the stack instead of in processor registers, as today’s processors commonly use separate registers for floating-point and integer types.

Footnote

1 The format has limits on how or low the exponent can go. These are not applicable here and so are not covered in detail. When the exponent is too high, the conversion to double results in a representation of infinity. When the exponent is too low, the significand will not be normalized, and a special exponent code is used to represent a subnormal value.