Unexpected uint64 behaviour 0xFFFF'FFFF'FFFF'FFFF - 1 = 0?

5.2k views Asked by At

Consider the following brief numpy session showcasing uint64 data type

import numpy as np
 
a = np.zeros(1,np.uint64)
 
a
# array([0], dtype=uint64)
 
a[0] -= 1
a
# array([18446744073709551615], dtype=uint64)
# this is 0xffff ffff ffff ffff, as expected

a[0] -= 1
a
# array([0], dtype=uint64)
# what the heck?

I'm utterly confused by this last output.

I would expect 0xFFFF'FFFF'FFFF'FFFE.

What exactly is going on here?

My setup:

>>> sys.platform
'linux'
>>> sys.version
'3.10.5 (main, Jul 20 2022, 08:58:47) [GCC 7.5.0]'
>>> np.version.version
'1.23.1'
5

There are 5 answers

13
user2357112 On BEST ANSWER

By default, NumPy converts Python int objects to numpy.int_, a signed integer dtype corresponding to C long. (This decision was made back in the early days when Python int also corresponded to C long.)

There is no integer dtype big enough to hold all values of numpy.uint64 dtype and numpy.int_ dtype, so operations between numpy.uint64 scalars and Python int objects produce float64 results instead of integer results. (Operations between uint64 arrays and Python ints may behave differently, as the int is converted to a dtype based on its value in such operations, but a[0] is a scalar.)

Your first subtraction produces a float64 with value -1, and your second subtraction produces a float64 with value 2**64 (since float64 doesn't have enough precision to perform the subtraction exactly). Both of these values are out of range for uint64 dtype, so converting back to uint64 for the assignment to a[0] produces undefined behavior (inherited from C - NumPy just uses a C cast).

On your machine, this happened to produce wraparound behavior, so -1 wrapped around to 18446744073709551615 and 2**64 wrapped around to 0, but that's not a guarantee. You might see different behavior on other setups. People in the comments did see different behavior.

7
Kelly Bundy On

a[0] - 1 is 1.8446744073709552e+19, a numpy.float64. That can't retain all the precision, so its value is 18446744073709551616=264. Which, when written back into a with dtype np.uint64, becomes 0.

1
Ξένη Γήινος On

All the existing answers are correct. I just want to add on Windows 10 I got a different result, namely 9223372036854775808.

Steps to reproduce:

Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.13.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np

In [2]: a = np.zeros(1,np.uint64)

In [3]: a
Out[3]: array([0], dtype=uint64)

In [4]: a[0] -= 1

In [5]: a
Out[5]: array([18446744073709551615], dtype=uint64)

In [6]: a[0] - 1
Out[6]: 1.8446744073709552e+19

In [7]: a[0] - 1 == 2**64
Out[7]: True

In [8]: a[0] -= 1
<ipython-input-8-9ab639258820>:1: RuntimeWarning: invalid value encountered in cast
  a[0] -= 1

In [9]: a
Out[9]: array([9223372036854775808], dtype=uint64)

In [10]: f'{a[0]:b}'
Out[10]: '1000000000000000000000000000000000000000000000000000000000000000'

In [11]: len(_)
Out[11]: 64

In [12]: a[0] == 2**63
Out[12]: True

In [13]: a[0] - 1
Out[13]: 9.223372036854776e+18

In [14]: a[0] - 1 == 2 ** 63
Out[14]: True

In [15]: a[0] -= 1

In [16]: a[0]
Out[16]: 9223372036854775808

In [17]: np.version.version
Out[17]: '1.24.2'

In binary increment by one will change the last bit from zero to one and one to zero, and going from one to zero will change the bit before the last bit, this will keep carry to the left until the leftmost bit goes from zero to one.

In unit64 if you want to subtract one from zero, the number zero can't get any smaller so it is treated as 2^64, and subtract one from it you get 2^64-1, which in binary is '1'*64 and 18446744073709551615 in decimal.

In [6]: a[0] - 1
Out[6]: 1.8446744073709552e+19

In [7]: a[0] - 1 == 2**64
Out[7]: True

Then when the value is operated with a Python int it is converted to a float 1.8446744073709552e+19 which because of the limitation of the format, is actually 2^64.

In [8]: a[0] -= 1
<ipython-input-8-9ab639258820>:1: RuntimeWarning: invalid value encountered in cast
  a[0] -= 1

In [9]: a
Out[9]: array([9223372036854775808], dtype=uint64)

Now this gets interesting, the maximum value uint64 can hold is 2^64 - 1, because 2 ^ 64 is one followed by 64 zeros in binary, so it can't be presented as is in uint64, it is in this case converted to zero before the decrement, as the last 64 bits in 2^64 are zeros.

That's why there is an warning.

But when doing the calculation, somehow it is converted to signed int64, and then converted to uint64 again.

The calculated result is -1, when stored in signed int64 form, is '1'+'0'*63 because the leftmost bit is used for the sign, and the number is negative if the sign bit is set.

Because one bit is used for the sign the maximum value of int64 is 2^63-1 which is 9223372036854775807 in decimal.

When the number negative one in int64 is converted to uint64 it is treated as 2^63 which is 9223372036854775808 in decimal, because the number holds a numerical value of 2^63.

Then the number stays there no matter how many decrements I do, because when the operations happen the uint64 type is converted to a float, which has a value of 2^63, and decrement by one cannot change that value.

0
Albert.Lang On

Possible workarounds

1. Explicit cast

a[0] -= np.uint64(1)

++

  • clean
  • fast

--

  • cumbersome

2. Fancy indexing

a[[0]] -= 1

+

  • easy to type

--

  • slow

3. Slice indexing

a[0:1] -= 1

-

  • mildly cumbersome
  • not the fastest
1
JohnD9191 On

The behavior you are seeing is due to how unsigned integer arithmetic works in numpy. When an unsigned integer is decremented, if the result is negative, it "wraps around" to the maximum value of the data type.

In your example, a[0] starts at the value 0xFFFFFFFFFFFFFFFF, which is the maximum value for a 64-bit unsigned integer. When you subtract 1 from it, the result is 0xFFFFFFFFFFFFFFFE, as you expected. However, when you subtract 1 from it again, the result is -1 (which is represented as 0xFFFFFFFFFFFFFFFF in binary). Since this value is negative, it wraps around to the maximum value of the data type, which is 0.

So, the behavior you are seeing is expected due to the properties of unsigned integer arithmetic. If you want to avoid this behavior, you can use a signed integer data type instead.