I have an m x 3
matrix A
and its row subset B
(n x 3
). Both are sets of indices into another, large 4D matrix; their data type is dtype('int64')
. I would like to generate a boolean vector x
, where x[i] = True
if B
does not contain row A[i,:]
.
There are no duplicate rows in either A
or B
.
I was wondering if there's an efficient way how to do this in Numpy? I found an answer that's somewhat related: https://stackoverflow.com/a/11903368/265289; however, it returns the actual rows (not a boolean vector).
You could follow the same pattern as shown in jterrace's answer, except use
np.in1d
instead ofnp.setdiff1d
:yields
You can use
assume_unique=True
(which can speed up the calculation) since there are no duplicate rows inA
orB
.Beware that
A.view(...)
will raiseif
A.flags['C_CONTIGUOUS']
isFalse
(i.e. ifA
is not a C-contiguous array). Therefore, in general we need to usenp.ascontiguous(A)
before callingview
.As B.M. suggests, you could instead view each row using the "void" dtype:
This is safe to use with integer dtypes. However, note that
so using
np.in1d
after viewing as void may return incorrect results for arrays with float dtype.Here is a benchmark of some of the proposed methods: