The question is more focused on performance of calculation.
I have 2 matrix with the same number of columns and different number of rows. One matrix is the 'pattern' whose rows have to be compared separately with the other matrix rows (all rows), then to be able to extract statistical values of mean equal to pattern, std,... So, I have the following matrix and the computation is the following one:
numCols = 10
pattern = np.random.randint(0,2,size=(7,numCols))
matrix = np.random.randint(0,2,size=(5,numCols))
comp_mean = np.zeros(pattern.shape[0])
for i in range(pattern.shape[0]):
comp_mean[i] = np.mean(np.sum(pattern[i,:] == matrix, axis=1))
print comp_mean # Output example: [ 1.6 1. 1.6 2.2 2. 2. 1.6]
This is clear. The problem is that the number of matrix rows of both is much bigger (~1.000.000). So this code goes very slow. I tryed to implement numpy syntaxis as sometimes it surprises me improving the calculation time. So I did the following code (it could be strange, but it works!):
comp_mean = np.mean( np.sum( (pattern[np.repeat(np.arange(pattern.shape[0]), matrix.shape[0])].ravel() == np.tile(matrix.ravel(),pattern.shape[0])).reshape(pattern.shape[0],matrix.shape[0],matrix.shape[1]), axis=2 ),axis=1)
print comp_mean
However, this code is slower than the previous one where the 'for' bucle is used. So I would like to know if there is any possibility to speed up the calculation.
EDIT
I have checked the runtime of the different approaches for the real matrix and the result is the following:
- Me - Approach 1: 18.04 seconds
- Me - Approach 2: 303.10 seconds
- Divakar - Approach 1: 18.79 seconds
- Divakar - Approach 2: 65.11 seconds
- Divakar - Approach 3.1: 137.78 seconds
- Divakar - Approach 3.2: 59.59 seconds
- Divakar - Approach 4: 6.06 seconds
EDIT(2)
Previous runs where performed in a laptop. I have run the code on a desktop. I have avoided the worst results, and the new runtimes are now different:
- Me - Approach 1: 6.25 seconds
- Divakar - Approach 1: 4.01 seconds
- Divakar - Approach 2: 3.66 seconds
- Divakar - Approach 4: 3.12 seconds
Few approaches with
broadcasting
could be suggested here.Approach #1
Approach #2
Approach #3
Approach #4
If the number of rows in
matrix
is a huge number, it could be better to stick to a for-loop to avoid the huge memory requirements for such a case, that might also lead to slow runtimes. Instead, we could do some optimizations within each loop iteration. Here's one such possible optimization shown -