How to group a dataframe and summarize over subgroups of consecutive numbers in Python?

Question

How to group a dataframe and summarize over subgroups of consecutive numbers in Python?

238 views Asked by Facundo Iannello At 21 November 2018 at 16:29

I have a dataframe with a column containing ids and other column containing numbers:

df1 = {'ID':[400, 400, 400, 400, 400, 400, 500, 500, 500, 500], 
      'Number':[1, 2, 3, 4, 8, 9, 22, 23, 26, 27]}

You may note that each Id has their correponding series of consecutive numbers in the column "Number". For example:

Id 400 contains a series of length 4 {1, 2, 3, 4} and another of length 2 {8, 9}

I´d like to obtain for each Id, the average length of their corresponding series. In this example:

df2 = {'ID':[400, 500], 'avg_length':[3, 2]}

Any ideas will be much appreciated!

Original Q&A

There are 2 answers

jpp On 21 November 2018 at 16:56

`groupby` + `cumsum` + `value_counts`

You can use groupby with a custom function:

df = pd.DataFrame({'ID':[400, 400, 400, 400, 400, 400, 500, 500, 500, 500], 
                   'Number':[1, 2, 3, 4, 8, 9, 22, 23, 26, 27]})

def mean_count(x):
    return (x - x.shift()).ne(1).cumsum().value_counts().mean()

res = df.groupby('ID')['Number'].apply(mean_count).reset_index()

print(res)

    ID  Number
0  400     3.0
1  500     2.0

**Vaishali** · Accepted Answer · 2018-11-21T16:48:22+00:00

Here is one way, uses groupby twice,

df1['tmp'] = (df1.Number - df1.Number.shift() > 1).cumsum()

df1.groupby(['ID', 'tmp']).Number.count().groupby(level = 0).mean().reset_index(name = 'avg_length')

2.29 ms ± 75.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

    ID  avg_length
0   400 3
1   500 2

Option 2: Without using apply twice, still uses tmp column created earlier

df1.groupby('ID').tmp.apply(lambda x: x.value_counts().mean()).reset_index(name = 'avg_length')

2.25 ms ± 99.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

TechQA.

How to group a dataframe and summarize over subgroups of consecutive numbers in Python?

There are 2 answers

`groupby` + `cumsum` + `value_counts`

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in PANDAS-GROUPBY

Related Questions in GROUP-SUMMARIES

Popular Questions

Trending Questions

How to group a dataframe and summarize over subgroups of consecutive numbers in Python?

There are 2 answers

groupby + cumsum + value_counts

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in PANDAS-GROUPBY

Related Questions in GROUP-SUMMARIES

Popular Questions

Trending Questions

`groupby` + `cumsum` + `value_counts`