I want to write this (-0.67) in an appropriate 8-bit Qn.m format how can I do that? normally we have -1.67 which can be represented as 2-bits for the integer part and 4 bits for the fraction part but now we have -0.67 how can we assign number of bits to it?
Change -0.67 floating point to fixed point
106 views Asked by Fatima Ismail At
2
There are 2 answers
0
On
Not sure why you said -1.67 took 2 bits for the integer and 4 for the fraction. If you're using fixed point, the number of bits on either side of the point is, well, fixed. If you're using 4.4, for example, then you get these:
+1.67 = 0001.1011₂ (rounded), so -1.67 = 1110.0101₂
+0.67 = 0000.1011₂ (rounded), so -0.67 = 1111.0101₂
Four binary digits right of the point gets you down to sixteenths, so you want to convert 0.67 to the closest sixteenth. Decimal 0.67 is just another way to write 67/100; the LCM of 100 and 16 is 400, so converting to that denominator we get 268/400. That falls between 10/16 = 250/400 (0.1010₂) and 11/16 = 275/400 (0.1011₂), but closer to 11/16, so 0.1011₂ is our answer for the fractional part.
How to convert a number to fixed point representation
f = Number of decimal bits
n = Floating point number
a = n * 2^f
b = fix(a) or round(a) ('underlying' integer associated with the fixed point number)
binary fixed point (bfp) = b/(2^f)
Example:
Let:
f = 4
n = .67
a = .67 * 2^4
a = 10.72
b = fix(a) # use a rounding scheme get the precision we want, fix() rounds toward 0
b = 10
bfp = b/2^4
bfp = 10/2^4
bfp = .625 (as a float; which is an approximation; we lost precision using a finite number of fractional bits)
bfp as binary s3.4 format = 0000.1010
You could also think of this number as the integer 10, with an associated scale factor of 2^4.
To get the negative-> take the twos complement
First take the one's complement then add 1
bfp ones = 1111.0101
bfp twos = 1111.0110
The float minus .67 is 1111.0110 in signed two's complement binary s3.4 format.
You could also think of this the integer -10 with an associated scale factor of 2^4.
This process of taking a float and converting to a fixed point number with finite precision (a finite number of fractional bits) is also known as quantization.
One way to perform quantization is to use the Matlab quantizer/quantize functions, or the Matlab Fi object.