How to reduce a pandas Series by performing an operation on every set of N sequential elements

Question

How to reduce a pandas Series by performing an operation on every set of N sequential elements

537 views Asked by Dagorodir At 10 July 2020 at 01:31

Say I have a pandas series, and I want to take the mean of every set of 8 rows. I don't have prior knowledge of the size of the series, and the index may not be 0-based. I currently have the following

N = 8

s = pd.Series(np.random.random(50 * N))

n_sets = s.shape[0] // N

split = ([m * N for m in range(n_sets)],
         [m * N for m in range(1, n_sets + 1)])

out_array = np.zeros(n_sets)

for i, (a, b) in enumerate(zip(*split)):

    out_array[i] = s.loc[s.index[a:b]].mean()

Is there a shorter way to do this?

Original Q&A

There are 1 answers

**MrNobody33** · Accepted Answer · 2020-07-10T01:46:06+00:00

You could try with groupby, by slicing the index in N (you can see here an explanation of the slicing), and then use pd.Series.mean():

newout_array=s.groupby(s.index//N).mean().to_list()

Output:

out_array  #original solution
[0.42147899 0.55668055 0.5222594  0.46066426 0.44378491 0.52719371
 0.42479113 0.46485387 0.2800083  0.57174865 0.59207811 0.58665479
 0.52414851 0.38158931 0.51884761 0.59007469 0.3449512  0.56385373
 0.34359674 0.44524997 0.44175351 0.42339394 0.5687501  0.3140091
 0.40985639 0.46649486 0.3101396  0.45664647 0.51829052 0.38875796
 0.45428001 0.52979064 0.62545921 0.64782618 0.65265239 0.56976799
 0.64277369 0.33528876 0.45973874 0.45341751 0.52690983 0.66427599
 0.59814577 0.35575622 0.62995929 0.61582329 0.38971679 0.4771326
 0.50889137 0.25105353]


newout_array  #new solution

[0.4214789945860148, 0.5566805507021909, 0.5222593998859411, 0.46066425607167216, 0.4437849132421554, 0.5271937114894408,
 0.424791134573943, 0.4648538659945887, 0.28000829556024387, 0.5717486453029332, 0.5920781058695997, 0.5866547941460012, 
 0.5241485100329547, 0.38158931177460725, 0.5188476113762392, 0.5900746905953183, 0.34495119855714756, 0.5638537286251522, 
 0.3435967359945349, 0.44524997190104454, 0.44175351484451975, 0.42339393886425913, 0.5687501027416468, 0.3140090963728155, 
 0.40985639015924036, 0.4664948621046134, 0.3101396034068746, 0.45664647332866076, 0.5182905157666298, 0.38875796468438406, 
 0.4542800111275337, 0.5297906368971982, 0.6254592119278896, 0.6478261817988752, 0.6526523935382951, 0.569767994485338, 
 0.642773691835847, 0.3352887578683835, 0.45973873832126594, 0.45341751320112617, 0.5269098312525405, 0.6642759923683706, 
 0.5981457683986061, 0.3557562229383897, 0.6299592930489117, 0.6158232897272005, 0.38971678834383916, 0.4771325988592886, 
 0.5088913710936904, 0.25105352820427246]

The difference it's because the number of decimals of each format, if you want to have only 8 decimals as the original out_array, you could try to map the elements with round function:

newout_array=s.groupby(s.index//N).mean().to_list()
newout_array=list(map(lambda x: round(x,8),newout_array))

TechQA.

How to reduce a pandas Series by performing an operation on every set of N sequential elements

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in SPACE-EFFICIENCY

Popular Questions

Trending Questions