Cython fastest way to handle arrays

89 views Asked by At

In my project I would like to deal with int arrays quickly. In particular I want to create them and iterate through them as fast as possible.

I see numpy recommended all over the place, as the numpy arrays work like cython typed memoryviews. This is mentioned in the cython documentation https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html

%%cython
import numpy as np

cpdef test_1(int[:] lst):
    cdef int[100] ret
    for ii in range(lst.shape[0]):
        ret[ii] = lst[ii]
    return ret

cpdef test_2(int[:] lst):
    cdef int[:] ret = np.empty(100, dtype=np.dtype("i"))
    for ii in range(lst.shape[0]):
        ret[ii] = lst[ii]
    return ret

cpdef test_3(list lst):
    cdef list ret = [0]*100
    for ii in range(len(lst)):
        ret[ii] = lst[ii]
    return ret

My expectation is that having a typed array is much much faster (test_1). I also expect that as numpy is advertised everywhere (test_2) it is reasonably fast. And that falling back to the python-object list with the [0]*100 creation is as slow as it can get. To my surprise timing the three tests I get the following:

import numpy as np
lst_1 = [1, 2, 3, 4, 5]
lst_2 = np.array(lst_1, dtype=np.intc)
%timeit test_1(lst_2)
%timeit test_2(lst_2)
%timeit test_3(lst_1)

1.3 µs ± 60.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.43 µs ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
481 ns ± 13.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Why is the python list is the fastest? Is there something I'm doing wrong when I create or access the memoryviews? How can I create and access int arrays super fast in cython?

0

There are 0 answers