I need to pass some data over from Go to an '300 es' shader. The data consists of two uint16s packed into a uint32. Each uint16 represents a half-precision float (float16). I found some PD Java code that looks like it will do the job, but I am struggling with porting the last statement, which uses a couple of zero-extend right shifts (I think the other shifts are fine i.e. non-negative). Since Go is a bit clever with extending, the solution to the port is eluding me. I did think maybe the first one could be changed into a left shift, since it just seems to be positioning a single bit for addition? but the final shift blows my mind out the water :)
btw I hope I got the bracketing right, since the operator precedence seems to be different between Go and Java regarding '-' and '>>'...
I need to go the other way around next, but that is hopefully easier without right shifts... famous last words!
Java code:
https://stackoverflow.com/a/6162687/345165
// returns all higher 16 bits as 0 for all results
public static int fromFloat( float fval )
{
int fbits = Float.floatToIntBits( fval );
int sign = fbits >>> 16 & 0x8000; // sign only
int val = ( fbits & 0x7fffffff ) + 0x1000; // rounded value
if( val >= 0x47800000 ) // might be or become NaN/Inf
{ // avoid Inf due to rounding
if( ( fbits & 0x7fffffff ) >= 0x47800000 )
{ // is or must become NaN/Inf
if( val < 0x7f800000 ) // was value but too large
return sign | 0x7c00; // make it +/-Inf
return sign | 0x7c00 | // remains +/-Inf or NaN
( fbits & 0x007fffff ) >>> 13; // keep NaN (and Inf) bits
}
return sign | 0x7bff; // unrounded not quite Inf
}
if( val >= 0x38800000 ) // remains normalized value
return sign | val - 0x38000000 >>> 13; // exp - 127 + 15
if( val < 0x33000000 ) // too small for subnormal
return sign; // becomes +/-0
val = ( fbits & 0x7fffffff ) >>> 23; // tmp exp for subnormal calc
return sign | ( ( fbits & 0x7fffff | 0x800000 ) // add subnormal bit
+ ( 0x800000 >>> val - 102 ) // round depending on cut off
>>> 126 - val ); // div by 2^(1-(exp-127+15)) and >> 13 | exp=0
}
My partial port:
func float32toUint16(f float32) uint16 {
fbits := math.Float32bits(f)
sign := uint16((fbits >> 16) & 0x00008000)
rv := (fbits & 0x7fffffff) + 0x1000
if rv >= 0x47800000 {
if (fbits & 0x7fffffff) >= 0x47800000 {
if rv < 0x7f800000 {
return sign | 0x7c00
}
return sign | 0x7c00 | uint16((fbits&0x007fffff)>>13)
}
return sign | 0x7bff
}
if rv >= 0x38800000 {
return sign | uint16((rv-0x38000000)>>13)
}
if rv < 0x33000000 {
return sign
}
rv = (fbits & 0x7fffffff) >> 23
return sign | uint16(((fbits&0x7fffff)|0x800000)+(0x800000>>(rv-102))>>(126-rv)) //these two shifts are my problem
}
func pack16(f1 float32, f2 float32) uint32 {
ui161 := float32toUint16(f1)
ui162 := float32toUint16(f2)
return ((uint32(ui161) << 16) | uint32(ui162))
}
I found what looked like even more efficient code, with no branching, but understanding the mechanics of how that works to be able to port it is a bit ;) beyond my rusty (not the language) skills.
https://stackoverflow.com/a/5587983
Cheers
[Edit] The code appears to work with the values I am currently using (it's hard to be precise since I have no experience debuging a shader). So I guess my question is about the correctness of my port, especially the final two shifts.
[Edit2] In the light of day I can see I already got the precedence wrong in one place and fixed the above example.
changed:
return sign | uint16(rv-(0x38000000>>13))
to:
return sign | uint16((rv-0x38000000)>>13)