np.searchsorted
only for 1D arrays.
I have a lexicographically sorted 2D array, meaning that 0-th row is sorted, then for same values of 0-th row corresponding elements of 1-th row are sorted too, for same values of 1-th row values of 2-th row are sorted too. In other words tuples consisting of columns are sorted.
I have some other 2D array with tuples-columns that need to be inserted into first 2D array into correct positions of columns. For 1D case np.searchsorted
was usually used in order to find correct positions.
But for 2D array is there an alternative to np.searchsorted
? Something analagous to how np.lexsort is a 2D alternative for 1D np.argsort.
If no such function then can be this functionality implemented in an efficient way using existing numpy functions?
I am interested in efficient solutions for arrays of any dtype
including np.object_
.
One naive way to handle any dtype
case would be to convert each column of both arrays to 1D array (or tuple) and then store these columns as another 1D array of dtype = np.object_
. Maybe it is not that naive and could be even fast especially if columns are quite high.
I've created several more advanced strategies.
Also simple strategy using
tuples
like in another my answer is implemented.Timings of all solutions are measured.
Most of strategies are using
np.searchsorted
as underlying engine. To implement these advanced strategies a special wrapping class_CmpIx
was used in order to provide custom comparison function (__lt__
) fornp.searchsorted
call.py.tuples
strategy just converts all columns to tuples and stores them as numpy 1D array ofnp.object_
dtype and then doing regular searchsorting.py.zip
uses python's zip for lazily doing same task.np.lexsort
strategy just usesnp.lexsort
in order to compare two columns lexicographically.np.nonzero
usesnp.flatnonzero(a != b)
expression.cmp_numba
uses ahead of time compiled numba code inside_CmpIx
wrapper for fast lexicographically lazy comparing of two provided elements.np.searchsorted
uses standard numpy's function but is measured for 1D case only.numba
strategy whole search algorithm is implemented from scratch using Numba engine, algorithm is based on binary search. There is_py
and_nm
variants of this algorithm,_nm
is much faster as it uses Numba compiler, while_py
is same algorithm but un-compiled. Also there is_sorted
flavor which does extra optimization of array to be inserted is already sorted.view1d
- methods suggested by @MadPhysicist in this answer. Commented out them in code, because they were returning incorrect answers for most of tests for all key lengths >1, probably due to some problems of raw viewing into array.Try it online!
outputs:
As it appears from timings
numba_nm
implementation is the fastest, it outperforms next fastest (py.zip
orpy.tuples_cached
) by15-100x
times. And it has comparable speed (1.85x
slower) to standardnp.searchsorted
for 1D case. Also it appeared to be that_sorted
flavor doesn't improve situation (i.e. using information about inserted array being sorted).cmp_numba
method that is machine-code compiled appears to be around1.5x
times faster on average thanpy.zip
that does same algorithm but in pure python. Due to average maximum equal-key depth being around15-18
elements numba doesn't gain much speedup here. If depth was hundreds then numba code would probably have a huge speedup.py.tuples_cached
strategy is faster thanpy.zip
for the case of key length<= 100
.Also it appears to be that
np.lexsort
is in fact very slow, either it is not optimized for the case of just two columns, or it spends time doing preprocessing like splitting rows into list, or it does non-lazy lexicographical comparison, the last case is probably the real reason as lexsort slows down with key length grow.Strategy
np.nonzero
is also non-lazy hence works slow too, and slows down with key length growth (but slows down not that fast asnp.lexsort
does).Timings above may be not precise, because my CPU slows down cores frequency 2-2.3 times at random times whenever it is overheated, and it overheats often because it is a powerful CPU inside laptop.