How do I sum a column based on separate category types, and preserve the zeros?

80 views Asked by At

Here is my dataframe:

Id Category Hours
1 A 1
1 A 3
1 B 4
2 A 2
2 B 6
3 A 3

And here is the output I want:

Id Total Hours A_Hours B_Hours
1 5 4 4
2 8 2 6
3 3 3 0

How do I achieve this?

I tried various methods of grouping and aggregation, even calculating the A_hours series separately and appending it to the dataframe but I didn't find a way to both calculate the zeros ( based on absence of hours under a particular category) and maintain the order.

3

There are 3 answers

3
mozway On BEST ANSWER

Use a pivot_table:

out = (df.pivot_table(index='Id', columns='Category', values='Hours',
                      aggfunc='sum', fill_value=0,
                      margins=True, margins_name='Total')
         .add_suffix('_Hours')
         .drop('Total').reset_index().rename_axis(columns=None)
      )

Output:

   Id  A_Hours  B_Hours  Total_Hours
0   1        4        4            8
1   2        2        6            8
2   3        3        0            3
0
Gabriel Santello On

Complementing the answer given by @Michael Cao, you can perfom the .loc to create the "Total Hours" per row:

summary = df.groupby(['Id', 'Category'])[['Hours']].sum().reset_index().pivot(index = 'Id', columns = 'Category', values = 'Hours').fillna(0)
summary.loc[:,'Total Hours'] = summary.sum(axis=1)
0
Michael Cao On

Perform a groupby by both Id and Category to get the hours per category and then pivot.

df.groupby(['Id', 'Category'])[['Hours']].sum().reset_index().pivot(index = 'Id', columns = 'Category', values = 'Hours').fillna(0)