Python multiple lists of different lengths, averages and standard deviations

1.4k views Asked by At

Given the array of lists below, i want to be able to create a new list, giving the average and standard deviation of the columns

a = [ [1, 2, 3],
      [2, 3, 4],
      [3, 4, 5, 6],
      [1, 2],
      [7, 2, 3, 4]]

Required result

mean =  2.8, 2.6, 3.75, 5
STDEV=  2.48997992, 0.894427191, 0.957427108, 1.414213562

I found the below example to give averages, which seems to work very well, but i wasnt clear how to adapt this for the standard deviation

import numpy as np
import numpy.ma as ma
from itertools import zip_longest

a = [ [1, 2, 3],
      [2, 3, 4],
      [3, 4, 5, 6],
      [1, 2],
      [7, 2, 3, 4]]


averages = [np.ma.average(ma.masked_values(temp_list, None)) for temp_list in zip_longest(*a)]


print(averages)
1

There are 1 answers

3
sacuL On BEST ANSWER

You can use these two lines:

>>> np.nanmean(np.array(list(zip_longest(*a)),dtype=float),axis=1)
array([2.8 , 2.6 , 3.75, 5.  ])

>>> np.nanstd(np.array(list(zip_longest(*a)),dtype=float),axis=1,ddof=1)
array([2.48997992, 0.89442719, 0.95742711, 1.41421356])

nanmean and nanstd compute mean and std respectively, and ignoring nan. So you are passing it the array:

>>> np.array(list(zip_longest(*a)),dtype=float)
array([[ 1.,  2.,  3.,  1.,  7.],
       [ 2.,  3.,  4.,  2.,  2.],
       [ 3.,  4.,  5., nan,  3.],
       [nan, nan,  6., nan,  4.]])

And computing the mean and standard deviation for each row, ignoring NaNs. The ddof argument stands for degrees of freedom, and I set it to 1 based on your desired output (default is 0)