While importing things into bigquery my hex strings got converted into float. I understand I need to fix the import but I'd like to do a best effort recovery of some of the data.
I'm trying my best to convert them back into hex, however, trying toy examples creates unexpected behaviors.
Ex. Given the following hex value:
hh = 0x6de517a18f003625e7fba9b9dc29b310f2e3026bbeb1997b3ada9de1e3cec8d6
# int: 49706871569187420659586066466638340615522392400360198520171375183123350210774
# float: 4.9706871569187424e+76
I'm not sure why the last couple digits goes from 420 to 424 in float
Trying to turn this value into float then back into hex heavily truncates the value
ff = 4.9706871569187424e+76 # same as calling float.fromhex('0x6de517a18f003625e7fba9b9dc29b310f2e3026bbeb1997b3ada9de1e3cec8d6')
int(ff) # 49706871569187423635521182730432496296162592228596139982404260202468916330496
# not sure why getting so many significant figures
hex(int(ff))
# '0x6de517a18f003800000000000000000000000000000000000000000000000000'
To me this is unexpected since there is a change in the last non-zero value in hex. (0036 -> 0038) I'm assuming it has something to do with how mantissa is being represented but was hoping someone on here would have a quick answer rather than going on a deep dive into float implementation in python.
Thanks @mark-tolonen for pointer to 53 bits of float64 and rounding. For my use case of best effort mapping to recover auto conversion issues, the following code will suffice
A bit more explanation:
Hex is represented by 4 bits (2^4 = 16), so when looking at binary positions
Since string is prepended by '0b' we take 2:(2+51) which is how we get to
bb[2:53]