How to reduce a pandas Series by performing an operation on every set of N sequential elements

537 views Asked by At

Say I have a pandas series, and I want to take the mean of every set of 8 rows. I don't have prior knowledge of the size of the series, and the index may not be 0-based. I currently have the following

N = 8

s = pd.Series(np.random.random(50 * N))

n_sets = s.shape[0] // N

split = ([m * N for m in range(n_sets)],
         [m * N for m in range(1, n_sets + 1)])

out_array = np.zeros(n_sets)

for i, (a, b) in enumerate(zip(*split)):

    out_array[i] = s.loc[s.index[a:b]].mean()

Is there a shorter way to do this?

1

There are 1 answers

0
MrNobody33 On BEST ANSWER

You could try with groupby, by slicing the index in N (you can see here an explanation of the slicing), and then use pd.Series.mean():

newout_array=s.groupby(s.index//N).mean().to_list()

Output:

out_array  #original solution
[0.42147899 0.55668055 0.5222594  0.46066426 0.44378491 0.52719371
 0.42479113 0.46485387 0.2800083  0.57174865 0.59207811 0.58665479
 0.52414851 0.38158931 0.51884761 0.59007469 0.3449512  0.56385373
 0.34359674 0.44524997 0.44175351 0.42339394 0.5687501  0.3140091
 0.40985639 0.46649486 0.3101396  0.45664647 0.51829052 0.38875796
 0.45428001 0.52979064 0.62545921 0.64782618 0.65265239 0.56976799
 0.64277369 0.33528876 0.45973874 0.45341751 0.52690983 0.66427599
 0.59814577 0.35575622 0.62995929 0.61582329 0.38971679 0.4771326
 0.50889137 0.25105353]


newout_array  #new solution

[0.4214789945860148, 0.5566805507021909, 0.5222593998859411, 0.46066425607167216, 0.4437849132421554, 0.5271937114894408,
 0.424791134573943, 0.4648538659945887, 0.28000829556024387, 0.5717486453029332, 0.5920781058695997, 0.5866547941460012, 
 0.5241485100329547, 0.38158931177460725, 0.5188476113762392, 0.5900746905953183, 0.34495119855714756, 0.5638537286251522, 
 0.3435967359945349, 0.44524997190104454, 0.44175351484451975, 0.42339393886425913, 0.5687501027416468, 0.3140090963728155, 
 0.40985639015924036, 0.4664948621046134, 0.3101396034068746, 0.45664647332866076, 0.5182905157666298, 0.38875796468438406, 
 0.4542800111275337, 0.5297906368971982, 0.6254592119278896, 0.6478261817988752, 0.6526523935382951, 0.569767994485338, 
 0.642773691835847, 0.3352887578683835, 0.45973873832126594, 0.45341751320112617, 0.5269098312525405, 0.6642759923683706, 
 0.5981457683986061, 0.3557562229383897, 0.6299592930489117, 0.6158232897272005, 0.38971678834383916, 0.4771325988592886, 
 0.5088913710936904, 0.25105352820427246]

The difference it's because the number of decimals of each format, if you want to have only 8 decimals as the original out_array, you could try to map the elements with round function:

newout_array=s.groupby(s.index//N).mean().to_list()
newout_array=list(map(lambda x: round(x,8),newout_array))