I often need to sort large numpy arrays (few billion elements), which became a bottleneck of my code. I am looking for a way to parallelize it.
Are there any parallel implementations for the ndarray.sort()
function? Numexpr module provides parallel implementation for most math operations on numpy arrays, but lacks sorting capabilities.
Maybe, it is possible to make a simple wrapper around a C++ implementation of parallel sorting, and use it through Cython?
I ended up wrapping GCC parallel sort. Here is the code:
parallelSort.pyx
Extra compiler args: -fopenmp (compile) and -lgomp (linking)
This makefile will do it:
And this shows that it works:
edit: fixed bug noticed in the comment below