Is there an efficient way to calculate sum of bits in each column over array in Python?
Example (Python 3.7 and Numpy 1.20.1):
- Create numpy array with values 0 or 1
import numpy as np
array = np.array(
[
[1, 0, 1],
[1, 1, 1],
[0, 0, 1],
]
)
- Compress size by np.packbits
pack_array = np.packbits(array, axis=1)
- Expected result: sum of bits in each position (column) without np.unpackbits to get the same as
array.sum(axis=0):
array([2, 1, 3])
I found just very slow solution:
dim = array.shape[1]
candidates = np.zeros((dim, dim)).astype(int)
np.fill_diagonal(candidates, 1)
pack_candidates = np.packbits(candidates, axis=1)
np.apply_along_axis(lambda c:np.sum((np.bitwise_and(pack_array, c) == c).all(axis=1)), 1, pack_candidates)
Using
np.unpackbitscan be problematic if the input array is big since the resulting array can be too big to fit in RAM, and even if it does fit in RAM, this would be far from being efficient since the huge array have to be written and read from the (slow) main memory. The same thing apply for CPU caches: smaller arrays can generally be computed faster. Moreover,np.unpackbitshave a quite big overhead for small arrays.AFAIK, this is not possible to do this operation very efficiently in Numpy while using a small amount of RAM (ie. using
np.unpackbits, as pointed out by @mathfux). However, Numba can be used to speed up this computation, especially for small arrays. Here is the code:If you want a faster implementation, you can optimize the code by working on fixed-size tiles. However, this makes the code also more complex. Here is the resulting code:
Here are performance results on my machine:
The memory footprint of the Numba implementations is much better too (at least 8 times smaller).