Numba corrupts data by affecting in-place

Question

Numba corrupts data by affecting in-place

233 views Asked by David Brochart At 06 January 2017 at 23:29

Numba and NumPy don't execute the following foo function in the same way:

from numba import jit
import numpy as np

@jit
def foo(a):
    a[:] = a[::-1] # reverse the array

a = np.array([0, 1, 2])
foo(a)
print(a)

With NumPy (without @jit) it prints [2, 1, 0], while with Numba (with @jit) it prints [2, 1, 2]. It looks like Numba modifies the array in-place, which leads in data corruption. It is easy to work around by making a copy of the array:

a[:] = a[::-1].copy()

But is this the desired behavior? Shouldn't Numba and NumPy give the same result?

I am using Numba v0.26.0 in Python 3.5.2.

Original Q&A

There are 2 answers

hpaulj On 07 January 2017 at 00:20

Your jit has the same sort of in-place problems that this Python loop does.

In [718]: x=list(range(3))
In [719]: for i in range(3):
     ...:     x[i] = x[2-i]
In [720]: x
Out[720]: [2, 1, 2]

The x[:] = x[::-1] is buffered, not because numpy recognizes that something special is happening, but because it always uses some sort of buffering with doing assignments.

The Python interpreter translates [] notation into calls to __setitem__ and __getitem__. So 681 and 682 do the same thing:

In [680]: x=np.arange(3)
In [681]: x[:] = x[::-1]
In [682]: x.__setitem__(slice(None), x.__getitem__(slice(None,None,-1)))
In [683]: x
Out[683]: array([0, 1, 2])

That means that x[::-1] is evaluated in full - to a temporary array, before being copied to x[:]. Now x[::-1] is a view, not a copy, so the setitem step must do some sort of buffered copy.

Another way to do this copy is with

np.copyto(x, x[::-1])

Checking the x.__array_interface__ I see that the data buffer address remains the same. So it is doing a copy, not just changing the data buffer address. But it's in low level compiled code.

Usually buffering is just a implementation issue that users don't need to worry about. ufunc.at is designed to deal with cases where that buffering creates problems. This topic comes up periodically; search for add.at.

=============

Note that Python lists behave the same way. The translation to 'get/setitem' is the same.

In [699]: x=list(range(3))
In [700]: x[:] = x[::-1]
In [701]: x
Out[701]: [2, 1, 0]

======================

I'm not entirely sure this is relevant or not, but since I tested these ideas I'll document them. https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html suggests using np.nditer as a stepping stone for implementing iterative tasks in cython.

A first stab at using nditer is:

In [769]: x=np.arange(5)
In [770]: it = np.nditer((x,x[::-1]), op_flags=[['readwrite'], ['readonly']])
In [771]: for i,j in it:
     ...:     print(i,j)
     ...:     i[...] = j
     ...:     
0 4
1 3
2 2
3 3
4 4
In [772]: x
Out[772]: array([4, 3, 2, 3, 4])

This produces the same sort of overlapping result as numba.

Adding a copy makes for a clean reversal.

it = np.nditer((x,x[::-1].copy()), op_flags=[['readwrite'], ['readonly']])

If I add the external_loop flag I also get a clean reversal:

In [781]: x=np.arange(5)
In [782]: it = np.nditer((x,x[::-1]), op_flags=[['readwrite'], ['readonly']], fl
     ...: ags = ['external_loop'])
In [783]: for i,j in it:
     ...:     print(i,j)
     ...:     i[...] = j
     ...:     
[0 1 2 3 4] [4 3 2 1 0]
In [784]: x
Out[784]: array([4, 3, 2, 1, 0])

**sklam** · Accepted Answer · 2017-01-09T17:07:52+00:00

sklam On 09 January 2017 at 17:07 BEST ANSWER

This is a known issue (https://github.com/numba/numba/issues/1960) and it was fixed in numba 0.27. Following NumPy behavior, the fix detects for overlap and makes temporary copies to avoid corrupting the data.

TechQA.

Numba corrupts data by affecting in-place

There are 2 answers

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in NUMBA

Popular Questions

Popular Tags

Trending Questions