numpy mean crashing for large arrays

61 views Asked by At

I have a list of one-dimensional arrays with different lengths and I wish to calculate the mean along the different arrays. Thus this "tolerant" mean contains different numbers of samples in each variable.

To do this I made a new (n,m) array (with n the number of one-dimensional arrays and m the maximal array length) by concatenating the one-dimensional arrays to zeros such that they have the same length. Then I used numpy ma array to mask the zeros from the mean calculation.

Defining the numpy ma array seems to be fine, but when I run np.mean and np.std the program crashes (kernel dies in Jupyter notebook). I suspect this is related with memory issues because the array is large (1964035, 2574). But I was surprised to see np.mean crashing although no crash happens when defining the array.

Here is my code to define the masked array

import numpy as np

myArrays = [np.random.randn(np.random.randint(2000)) for i in range(1000000)]  # only for this example

maxLen = max([myArray.shape[0] for myArray in myArrays])

myMasked = np.ma.array([np.concatenate([myArray, np.zeros(maxLen-myArray.shape[0])]) \
                                                  for myArray in myArrays] , \

            mask=[np.concatenate([[False for i in range(myArray.shape[0])], \
                     [True for i in range(maxLen-myArray.shape[0])]]) \
                                            for myArray in myArrays])

Then running np.mean makes the program crash

myMean = np.mean(myMasked, axis=0)   
0

There are 0 answers