I was trying to use numpy.divmod
with very large integers and I noticed a strange behaviour. At around 2**63 ~ 1e19
(which should be the limit for the usual memory representation of int
in python 3.5+), this happens:
from numpy import divmod
test = 10**6
for i in range(15,25):
x = 10**i
print(i, divmod(x, test))
15 (1000000000, 0)
16 (10000000000, 0)
17 (100000000000, 0)
18 (1000000000000, 0)
19 (10000000000000.0, 0.0)
20 ((100000000000000, 0), None)
21 ((1000000000000000, 0), None)
22 ((10000000000000000, 0), None)
23 ((100000000000000000, 0), None)
24 ((1000000000000000000, 0), None)
Somehow, the quotient and remainder works fine till 2**63
, then there's something different.
My guess is that the int
representation is "vectorized" (i.e. as BigInt
in Scala, as a little endian Seq
of Long
). But then, I'd expect, as a result of divmod(array, test)
, a pair of arrays: the array of quotients and the array of remainders.
I have no clue about this feature. It does not happen with the built-in divmod (everything works as expected)
Why does this happen? Does it have something to do with int
internal representation?
Details: numpy version 1.13.1, python 3.6
The problem is that
np.divmod
will convert the arguments to arrays and what happens is really easy:You will get an
object
array for10**i
withi > 19
, in the other cases it will be a "real NumPy array".And, indeed, it seems like
object
arrays behave strangely withnp.divmod
:I guess in this case the normal Python built-in
divmod
calculates the first returned element and all remaining items are filled withNone
because it delegated to Pythons function.Note that
object
arrays often behave differently than native dtype arrays. They are a lot slower and often delegate to Python functions (which is often the reason for different results).