Optimizing access on numpy arrays for numba

Question

Optimizing access on numpy arrays for numba

3k views Asked by Michael At 19 December 2013 at 10:39

I recently stumbled upon numba and thought about replacing some homemade C extensions with more elegant autojitted python code. Unfortunately I wasn't happy, when I tried a first, quick benchmark. It seems like numba is not doing much better than ordinary python here, though I would have expected nearly C-like performance:

from numba import jit, autojit, uint, double
import numpy as np
import imp
import logging
logging.getLogger('numba.codegen.debug').setLevel(logging.INFO)

def sum_accum(accmap, a):
    res = np.zeros(np.max(accmap) + 1, dtype=a.dtype)
    for i in xrange(len(accmap)):
        res[accmap[i]] += a[i]
    return res

autonumba_sum_accum = autojit(sum_accum)
numba_sum_accum = jit(double[:](int_[:], double[:]), 
                      locals=dict(i=uint))(sum_accum)

accmap = np.repeat(np.arange(1000), 2)
np.random.shuffle(accmap)
accmap = np.repeat(accmap, 10)
a = np.random.randn(accmap.size)

ref = sum_accum(accmap, a)
assert np.all(ref == numba_sum_accum(accmap, a))
assert np.all(ref == autonumba_sum_accum(accmap, a))

%timeit sum_accum(accmap, a)
%timeit autonumba_sum_accum(accmap, a)
%timeit numba_sum_accum(accmap, a)

accumarray = imp.load_source('accumarray', '/path/to/accumarray.py')
assert np.all(ref == accumarray.accum(accmap, a))

%timeit accumarray.accum(accmap, a)

This gives on my machine:

10 loops, best of 3: 52 ms per loop
10 loops, best of 3: 42.2 ms per loop
10 loops, best of 3: 43.5 ms per loop
1000 loops, best of 3: 321 us per loop

I'm running the latest numba version from pypi, 0.11.0. Any suggestions, how to fix the code, so it runs reasonably fast with numba?

Original Q&A

There are 2 answers

Michael On 19 December 2013 at 11:42

I figured out myself. numba wasn't able to determine the type of the result of np.max(accmap), even if the type of accmap was set to int. This somehow slowed down everything, but the fix is easy:

@autojit(locals=dict(reslen=uint))
def sum_accum(accmap, a):
    reslen = np.max(accmap) + 1
    res = np.zeros(reslen, dtype=a.dtype)
    for i in range(len(accmap)):
        res[accmap[i]] += a[i]
    return res

The result is quite impressive, about 2/3 of the C version:

10000 loops, best of 3: 192 us per loop

Update 2022: The work on this issue led to the python package numpy_groupies, which is available here:

https://github.com/ml31415/numpy-groupies

**M4rtini** · Accepted Answer · 2013-12-19T19:59:33+00:00

@autojit
def numbaMax(arr):
    MAX = arr[0]
    for i in arr:
        if i > MAX:
            MAX = i
    return MAX

@autojit
def autonumba_sum_accum2(accmap, a):
    res = np.zeros(numbaMax(accmap) + 1)
    for i in xrange(len(accmap)):
        res[accmap[i]] += a[i]
    return res

10 loops, best of 3: 26.5 ms per loop <- original
100 loops, best of 3: 15.1 ms per loop <- with numba but the slow numpy max
10000 loops, best of 3: 47.9 µs per loop <- with numbamax

TechQA.

Optimizing access on numpy arrays for numba

There are 2 answers

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in JIT

Related Questions in NUMBA

Related Questions in ACCUMARRAY

Popular Questions

Popular Tags

Trending Questions