I have a Pandas dataframe, with 4 rows, and one of the columns (named limit) contains floating point values, where any zeros must be replaced with 9999999999 (9.999999999 billion). The column is set to the float32 data type, and I use the pandas.DataFrame.where method to do the replacement. But it's not working as expected because Numpy is rounding up 9999999999 to 10000000000 (10 billion).

I've tried this in iPython 3 (Python 3.6.8), Pandas version 0.24.2, Numpy version 1.14.0.

This is the replacement statement

df['limit'] = df['limit'].where(df['limit'] != 0, 9999999999)

I'm seeing the following column values for limit:

0    1.000000e+10
1    1.000000e+10
2    1.000000e+10
3    1.000000e+10

but I'm expecting

0    9999999999.0
1    9999999999.0
2    9999999999.0
3    9999999999.0

Why does the rounding up happen? This doesn't happen with plain Python

In [1]: (9.999999999) * 10**9                                                                                                    
Out[1]: 9999999999.0

1 Answers

2
Kasrâmvd On Best Solutions

This is simply because int32 is not capable of preserving that number. You can check this by calculating the number of bits needed for demonstrating that number:

In [24]: np.floor(np.log2(9999999999)) + 1
Out[24]: 34.0

As you can see you need at least 34 bits for demonstrating that number. Therefore you should use int64 as a larger data type for representing it.

Even if you test this by putting the number in a series with same data type you'll see the unexpected result (overflow) again:

In [25]: s = pd.Series([9999999999], dtype=pd.np.int32)

In [26]: s
Out[26]: 
0    1410065407
dtype: int32