Given the following df:
data = {'Org': ['Tom', 'Kelly', 'Rick', 'Dave','Sara','Liz'],
'sum': [3, 4, 4, 4, 5, 5]}
df = pd.DataFrame(data)
# Org sum
# 0 Tom 3
# 1 Kelly 4
# 2 Rick 4
# 3 Dave 4
# 4 Sara 5
# 5 Liz 5
I want to shuffle only the duplicate values in that column and keep the sorted order.
Output should look like this:
data = {'Org': ['Tom', 'Rick', 'Dave', 'Kelly','Liz','Sara'],
'sum': [3, 4, 4, 4, 5, 5]}
df = pd.DataFrame(data)
# Org sum
# 0 Tom 3
# 1 Rick 4
# 2 Dave 4
# 3 Kelly 4
# 4 Liz 5
# 5 Sara 5
with df.sample(frac=1) it will shuffle all rows, but that is not what I like to achieve.
sorted groups
If your groups are contiguous, and you want to keep the relative order, use
groupby.sample:Example output:
If you wand the output sorted by sum, then:
which will ensure that the groups are sorted, even if they are not sorted in the input.
intact groups
Conversely, if you want to leave the original order of the groups fully intact but want to still shuffle within a group, like in this example:
Then use
groupby.transformto shuffle the indices in place, then reindex:Example output: