Faster way of computing the mean with pandas groupy + apply and condensing groups

Question

Faster way of computing the mean with pandas groupy + apply and condensing groups

53 views Asked by LizzAlice At 19 October 2020 at 12:25

I want to groupby two values and if the group contains more than one element, return only the first row of the group with the value replaced by the mean for the group. If there is only one element, I want to return directly. My code looks like this:

final = df.groupby(["a", "b"]).apply(condense).drop(['a', 'b'], axis=1).reset_index()

def condense(df):
    if df.shape[0] > 1:
        mean = df["c"].mean()
        record = df.iloc[[0]]
        record["c"] = mean
        return(record)
    else:
        return(df)

And the df looks something like this:

a      b     c   d
"f"   "e"    2   True
"f"   "e"    3   False
"c"   "a"    1   True

As the data frame is quite large, I have 73800 groups and the computation of the whole groupby + apply takes about a minute. This is far too long. Is there a way to make it run faster?

Original Q&A

There are 1 answers

**jezrael** · Accepted Answer · 2020-10-19T12:31:59+00:00

I think mean of one value is same like mean of multiple values, so you can solution simplify by GroupBy.agg with mean for column c and all another values aggregate by first:

d = dict.fromkeys(df.columns.difference(['a','b']), 'first')
d['c'] = 'mean'
print (d)
{'c': 'mean', 'd': 'first'}

df = df.groupby(["a", "b"], as_index=False).agg(d)
print (df)
   a  b    c     d
0  c  a  1.0  True
1  f  e  2.5  True

TechQA.

Faster way of computing the mean with pandas groupy + apply and condensing groups

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in PANDAS-GROUPBY

Related Questions in PANDAS-APPLY

Popular Questions

Popular Tags

Trending Questions