IEEE 754 Denormalized Decimal Converting to Half-Point Binary

2.1k views Asked by At

I am trying to convert 0.0000211 to binary. Here is what I understand so far:

E = -bias + 1. bias = 15, E = -14

Sign bit and exponent = 0.

So I have:

0 00000 ??????????

With the half-point format being 1 sign bit, 5 exponent bits, and 10 fraction bits.

My question is how can I find the fraction of this denormalized number? What does E, and the bias mean in this context? Any help would be appreciated

Note: I need to be able to do this manually for my final.

2

There are 2 answers

0
Clayton Mills On BEST ANSWER

The mantissa (OPs ? bits) of a half, float or double is normalized to remove the leading zeros. Usually this is done until the number is, 1.0 <= number < 2.0. But in this case the number is in the sub-normals range (The exponent is 00000 as you've already established. Which means the original number was less than the minimum normal of 6.10352 × 10^−5, ie when you're trying to shift to make the number 1.0 <= number < 2.0, you hit the exponents minimum limit), in this case they shift 15 times, ie multiply by 2^15 and store as many bits after the point as possible (for half floats this is 10bits). Doing this means they can store very small numbers, because for the sub-normal range they have an implicit 0. in front of the mantissa when restoring the number and they allow leading zeros on the mantissa.

So 0.0000211 = b'0.000000000000000101100001111111111100111...

2^15 * 0.0000211 = 0.6914048 = b'0.101100001111111111100111...

We store 1011000011 because the sub normal range removes the implicit 0. (ie for 0.XXXXXXXXXX we only store the Xs)

So in this case the mantissa (OPs ? bits) are 1011000011

sign   exp      mantissa
0      00000    1011000011

This can be checked with python using numpy and struct

>>> import numpy as np
>>> import struct
>>> a=struct.pack("H",int("0000001101010000",2))
>>> np.frombuffer(a, dtype =np.float16)[0]
2.116e-05

So for your final... At the very least you're going to need to learn how to turn a decimal less than 1.0 into a binary, and remember a few rules. You seem to be on top of calculating the exponent.

Have a look at...

https://math.stackexchange.com/questions/1128204/how-to-convert-from-floating-point-binary-to-decimal-in-half-precision16-bits

One of the answers to this question has python code for the whole conversion. Which may be useful for learning.

1
old_timer On

So instead of your number going to convert 0.2 decimal to binary by hand.

Starting with a program to give me some fractions in base 10, probably a better way to do this, the link I sent doesnt work that works for whole numbers.

1/2 0.50000000
1/4 0.25000000
1/8 0.12500000
1/16 0.06250000
1/32 0.03125000
1/64 0.01562500
1/128 0.00781250
1/256 0.00390625

so:

0.2 - 0.5 no 
0.2 - 0.25 no
0.2 - 0.125 = 0.075
0.075 - 0.0625 = 0.0125
0.0125 - 0.03125 no
0.0125 - 0.015625 no
0.0125 - 0.00781250 = 0.0046875
0.0046875 - 0.00390625 = 0.00078125
0.00078125 - 0.001953125 no
0.00078125 - 0.0009765625 no
0.00078125 - 0.00048828125 yes

I happen to know this cannot be represented exactly in binary it is a repeating number so the above tells me:

0.0011001100110011...

Is the binary number for 0.2 in base 10.

Now to normalize this I need 1.xxxx so I shift left 3 and get

1.1001100110011 * 2^(-3)

IEEE 754 single precision format (mantissa and fraction are the same thing)

seeeeeeeemmmmmmmmmmmmmmmmmmmmmmm

Positive number so the sign s is zero

exponent is 2 to the power e-127

so we add 127 bias to -3 and get 124 0x7c

note since the 1.xxxx is implied no reason to waste the 1 that is removed we just put the fraction in.

0 01111100 10011001100110011001100
0011 1110 0100 1100 1100 1100 1100 1100
0x3E4CCCCC

Now I cheated and let the computer convert this for me and got:

0 01111100 10011001100110011001101
0x3E4CCCCD

and that makes sense because before we chop of the end we have 11001 that last bit being chopped off is greater than or equal to half our base so we round up if we want to round that makes it a 1101. When we have base ten to round up we need equal to or half the base so 5 0.105 rounds up to 0.11. so in binary 0.11001 rounds up to 0.1101.

so half point format appears to be

seeeeemmmmmmmmmm

and the bias is 2^(e-15)

so we add 15 to -3 we get 12

s is 0 it is positive e is 12 and m is I assume without the implied 1 bit so

0 01100 1001100110
0011 0010 0110 0110
0x3266

where it gets chopped off was a 0 so it doesnt round up assuming a round up rounding mode...

so that is a normalized version of 0.2 in 16 bit IEEE floating point format.

Now if you read wikipedia which is good enough to understand this, if when you normalize this to 1.xxxxx you will shift left (or right if greater than 1.xxxx, left if less than 1.xxxx which it is in this case) some number N bits to do that so your number is 1.xxxx times 2^(-N) as shown in the wikipedia page

Emin = 000012 − 011112 = −14

So an N of 14 is the worst case you can have if you have to shift more than 14 bits you cannot normalize this number. so they have a case for this shown in wikipedia, they call it subnormal same as denormal. you shift it 14 bits to the left which is implied by the 2^-14 so you convert your binary number into 0.xxxxxxxxxx * 2^-14, whatever the first ten xxxxx bits that is your mantissa/fraction. and the exponent in the encoding is a special number 00000

so 0 00000 xxxxxxxxxx is the encoding for a denormal in IEEE 754 half point binary.