Recently I am confusing about the definition of underflow of IEEE754 standard. We know that if an implementation doesn't support subnormal numbers, then the smallest number that can be represented is MinNorm = 1.0 * 2^-126. For any operation, if its result is smaller than MinNorm, will be regard as underflow. But if an implementation support subnormal numbers, then the smallest number that can be represented is MinSubnorm = 1.0 * 2^-149. Now here is the question: if a operation's result is smaller than MinNorm, if it's underflow? How about smaller than MinSubnorm ?
And now I am working on the implementation of a FPU which support the subnormal numbers. We assume that the result before rounding is strictly between -MinNorm and +MinNorm and it will also be smaller than MinNorm after rounding(representable using subnormal numbers). What would I regard it as? underflow or non-underflow? if i need to set the status bit of underflow?
I have found some imformation online, but opinons diverge as follow:
http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/underflow.html Underflow occurs when you perform an operation that's smaller than the smallest magnitude non-zero number. In IEEE 754 single precision this means a value which has has magnitude (i.e., absolute value) less than 1.0 x 2-149.
http://en.wikipedia.org/wiki/Arithmetic_underflow Arithmetic underflow can occur when the true result of a floating point operation is smaller in magnitude (that is, closer to zero) than the smallest value representable as a normal floating point number in the target datatype
The IEEE754 2008 standard (ยง7.5) defines that the underflow exception shall be signalled when the result is
So in this case, wikipedia is correct.
UPDATE: The default rules are that you DO set the status bit, unless the result is exact. e.g if a subnormal result is obtained from an addition or subtraction, then no rounding needs to occur, so you won't set the underflow status bit. On the other hand, if you have a number
1.0001
and you multiply it by2^-149
, then the result can't be exactly represented, and will be rounded to2^-149
, so you would set the underflow and inexact status bits.