How to group rows by values to create new columns in Pandas DataFrame?

35 views Asked by At

I have a dataframe df which looks like this :

`    image_id  category_id chessboard_position
0         0            7                  a8
1         0            8                  b8
2         0            9                  c8
3         1            6                  f7
4         1            6                  g7
5         1            6                  h7
6         1            0                  b4
7         1            0                  a2`

and I want it to looks like this :

`    image_id      0           6           7   8   9           
0         0                                a8  b8  c8
1         1    {b4, a2}   {f7, g7, h7}                  

I tried to use : df = df.groupby('image_id')["chessboard_position"].apply(lambda x: pd.Series(x.values)).unstack(1).

This gives me something akin to :

`    image_id          1   2   3    4      5  
0         0            a8  b8  c8                             
1         1            f7  g7  h7   b4    a2          
           `

It fills the new columns by using chessboard_positions values in order. I need something that will allows me to sort the "unstacking" using the category_id value.

1

There are 1 answers

0
Andrej Kesely On BEST ANSWER

You can use pd.crosstab:

out = (
    pd.crosstab(
        df["image_id"],
        df["category_id"],
        df["chessboard_position"],
        aggfunc=lambda v: v if len(v) == 1 else set(v),
    )
    .reset_index()
    .rename_axis(columns=None)
)

print(out)

Prints:

   image_id         0             6    7    8    9
0         0       NaN           NaN   a8   b8   c8
1         1  {a2, b4}  {g7, h7, f7}  NaN  NaN  NaN