Numba corrupts data by affecting in-place

225 views Asked by At

Numba and NumPy don't execute the following foo function in the same way:

from numba import jit
import numpy as np

@jit
def foo(a):
    a[:] = a[::-1] # reverse the array

a = np.array([0, 1, 2])
foo(a)
print(a)

With NumPy (without @jit) it prints [2, 1, 0], while with Numba (with @jit) it prints [2, 1, 2]. It looks like Numba modifies the array in-place, which leads in data corruption. It is easy to work around by making a copy of the array:

a[:] = a[::-1].copy()

But is this the desired behavior? Shouldn't Numba and NumPy give the same result?

I am using Numba v0.26.0 in Python 3.5.2.

2

There are 2 answers

0
sklam On BEST ANSWER

This is a known issue (https://github.com/numba/numba/issues/1960) and it was fixed in numba 0.27. Following NumPy behavior, the fix detects for overlap and makes temporary copies to avoid corrupting the data.

9
hpaulj On

Your jit has the same sort of in-place problems that this Python loop does.

In [718]: x=list(range(3))
In [719]: for i in range(3):
     ...:     x[i] = x[2-i]
In [720]: x
Out[720]: [2, 1, 2]

The x[:] = x[::-1] is buffered, not because numpy recognizes that something special is happening, but because it always uses some sort of buffering with doing assignments.

The Python interpreter translates [] notation into calls to __setitem__ and __getitem__. So 681 and 682 do the same thing:

In [680]: x=np.arange(3)
In [681]: x[:] = x[::-1]
In [682]: x.__setitem__(slice(None), x.__getitem__(slice(None,None,-1)))
In [683]: x
Out[683]: array([0, 1, 2])

That means that x[::-1] is evaluated in full - to a temporary array, before being copied to x[:]. Now x[::-1] is a view, not a copy, so the setitem step must do some sort of buffered copy.

Another way to do this copy is with

np.copyto(x, x[::-1])

Checking the x.__array_interface__ I see that the data buffer address remains the same. So it is doing a copy, not just changing the data buffer address. But it's in low level compiled code.

Usually buffering is just a implementation issue that users don't need to worry about. ufunc.at is designed to deal with cases where that buffering creates problems. This topic comes up periodically; search for add.at.

=============

Note that Python lists behave the same way. The translation to 'get/setitem' is the same.

In [699]: x=list(range(3))
In [700]: x[:] = x[::-1]
In [701]: x
Out[701]: [2, 1, 0]

======================

I'm not entirely sure this is relevant or not, but since I tested these ideas I'll document them. https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html suggests using np.nditer as a stepping stone for implementing iterative tasks in cython.

A first stab at using nditer is:

In [769]: x=np.arange(5)
In [770]: it = np.nditer((x,x[::-1]), op_flags=[['readwrite'], ['readonly']])
In [771]: for i,j in it:
     ...:     print(i,j)
     ...:     i[...] = j
     ...:     
0 4
1 3
2 2
3 3
4 4
In [772]: x
Out[772]: array([4, 3, 2, 3, 4])

This produces the same sort of overlapping result as numba.

Adding a copy makes for a clean reversal.

it = np.nditer((x,x[::-1].copy()), op_flags=[['readwrite'], ['readonly']])

If I add the external_loop flag I also get a clean reversal:

In [781]: x=np.arange(5)
In [782]: it = np.nditer((x,x[::-1]), op_flags=[['readwrite'], ['readonly']], fl
     ...: ags = ['external_loop'])
In [783]: for i,j in it:
     ...:     print(i,j)
     ...:     i[...] = j
     ...:     
[0 1 2 3 4] [4 3 2 1 0]
In [784]: x
Out[784]: array([4, 3, 2, 1, 0])