Pandas shuffle rows within groups in dataframe, leaving the relative groups order intact

Question

Pandas shuffle rows within groups in dataframe, leaving the relative groups order intact

51 views Asked by hadji At 18 March 2024 at 11:49

Given the following df:

data = {'Org': ['Tom', 'Kelly', 'Rick', 'Dave','Sara','Liz'], 
        'sum': [3, 4, 4, 4, 5, 5]}
df = pd.DataFrame(data)

#      Org  sum
# 0    Tom    3
# 1  Kelly    4
# 2   Rick    4
# 3   Dave    4
# 4   Sara    5
# 5    Liz    5

I want to shuffle only the duplicate values in that column and keep the sorted order.

Output should look like this:

data = {'Org': ['Tom', 'Rick', 'Dave', 'Kelly','Liz','Sara'],
        'sum': [3, 4, 4, 4, 5, 5]}
df = pd.DataFrame(data)

#      Org  sum
# 0    Tom    3
# 1   Rick    4
# 2   Dave    4
# 3  Kelly    4
# 4    Liz    5
# 5   Sara    5

with df.sample(frac=1) it will shuffle all rows, but that is not what I like to achieve.

Original Q&A

There are 1 answers

**mozway** · Accepted Answer · 2024-03-18T11:50:57+00:00

sorted groups

If your groups are contiguous, and you want to keep the relative order, use groupby.sample:

out = df.groupby('sum', sort=False).sample(frac=1)

Example output:

     Org  sum
0    Tom    3
3   Dave    4
1  Kelly    4
2   Rick    4
5    Liz    5
4   Sara    5

If you wand the output sorted by sum, then:

out = df.groupby('sum', sort=False).sample(frac=1)
# or
out = df.sample(frac=1).sort_values(by='sum', kind='stable')

which will ensure that the groups are sorted, even if they are not sorted in the input.

intact groups

Conversely, if you want to leave the original order of the groups fully intact but want to still shuffle within a group, like in this example:

     Org  sum
0    Tom    3
1  Kelly    4
2   Rick    4
3   Sara    5
4    Liz    5
5   Dave    4 # this is part of group "4" but we want the row to stay there

Then use groupby.transform to shuffle the indices in place, then reindex:

out = df.loc[df.groupby('sum', sort=False)['sum']
               .transform(lambda g: g.sample(frac=1).index)]

Example output:

     Org  sum
0    Tom    3
2   Rick    4
5   Dave    4
4    Liz    5
3   Sara    5
1  Kelly    4 # the group was shuffled, not the absolute position

TechQA.

Pandas shuffle rows within groups in dataframe, leaving the relative groups order intact

There are 1 answers

sorted groups

intact groups

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in GROUP-BY

Related Questions in SHUFFLE

Popular Questions

Trending Questions