I have dataframe df with following data.

A    B    C    D
1    1    3    1
1    2    9    8
1    3    3    9
2    1    2    9
2    2    1    4
2    3    9    5
2    4    6    4
3    1    4    1
3    2    0    4
4    1    2    6
5    1    2    4
5    2    8    3

grp = df.groupby('A')

Next I want to make all groups of dataframe df grouped on columns A to have same number of rows. Either Truncate extra rows or pad 0 rows. For above data, I want to make all groups to have 3 rows. I required the following results.

A    B    C    D
1    1    3    1
1    2    9    8
1    3    3    9
2    1    2    9
2    2    1    4
2    3    9    5
3    1    4    1
3    2    0    4
3    0    0    0
4    1    2    6
4    0    0    0
4    0    0    0
5    1    2    4
5    2    8    3
5    0    0    0

Similarly, I may want to groupby on multiple columns, like grp = df.groupby(['A','B'])

1 Answers

2
jezrael On Best Solutions

Use GroupBy.cumcount for counter column with DataFrame.reindex by MultiIndex.from_product:

df['g'] = df.groupby('A').cumcount()
mux = pd.MultiIndex.from_product([df['A'].unique(), range(3)], names=('A','g'))

df = (df.set_index(['A','g'])
        .reindex(mux, fill_value=0)
        .reset_index(level=1, drop=True)
        .reset_index())
print (df)
    A  B  C  D
0   1  1  3  1
1   1  2  9  8
2   1  3  3  9
3   2  1  2  9
4   2  2  1  4
5   2  3  9  5
6   3  1  4  1
7   3  2  0  4
8   3  0  0  0
9   4  1  2  6
10  4  0  0  0
11  4  0  0  0
12  5  1  2  4
13  5  2  8  3
14  5  0  0  0

Another solution with DataFrame.merge with left join with helper DataFrame:

from  itertools import product

df['g'] = df.groupby('A').cumcount()
df1 = pd.DataFrame(list(product(df['A'].unique(), range(3))), columns=['A','g'])

df = df1.merge(df, how='left').fillna(0).astype(int).drop('g', axis=1)
print (df)
    A  B  C  D
0   1  1  3  1
1   1  2  9  8
2   1  3  3  9
3   2  1  2  9
4   2  2  1  4
5   2  3  9  5
6   3  1  4  1
7   3  2  0  4
8   3  0  0  0
9   4  1  2  6
10  4  0  0  0
11  4  0  0  0
12  5  1  2  4
13  5  2  8  3
14  5  0  0  0

EDIT:

df['g'] = df.groupby(['A','B']).cumcount()

mux = pd.MultiIndex.from_product([df['A'].unique(), 
                                  df['B'].unique(), 
                                  range(3)], names=('A','B','g'))
df = (df.set_index(['A','B','g'])
        .reindex(mux, fill_value=0)
        .reset_index(level=2, drop=True)
        .reset_index())
print (df.head(10))
   A  B  C  D
0  1  1  3  1
1  1  1  0  0
2  1  1  0  0
3  1  2  9  8
4  1  2  0  0
5  1  2  0  0
6  1  3  3  9
7  1  3  0  0
8  1  3  0  0
9  1  4  0  0

from  itertools import product

df['g'] = df.groupby(['A','B']).cumcount()
df1 = pd.DataFrame(list(product(df['A'].unique(),
                                df['B'].unique(), 
                                range(3))), columns=['A','B','g'])

df = df1.merge(df, how='left').fillna(0).astype(int).drop('g', axis=1)