Summarising pandas dataframe in single row

Question

Summarising pandas dataframe in single row

416 views Asked by windwalker At 07 October 2020 at 19:45

I am hoping for some help in summarizing the dataframe detailed below into a one row summary as shown in desired output further down on the page. Many thanks in advance.

employees = {'Name of Employee': ['Mark','Mark','Mark','Mark','Mark','Mark', 'Mark','Mark','Mark','Mark','Mark','Mark','Mark'],
                         'Department': ['21','21','21','21','21','21', '21','21','21','21','21','21','21'],
                         'Team': ['2','2','2','2','2','2','2','2','2','2','2','2','2'],
                         'Log': ['2020-02-19 09:01:17', '2020-02-19 09:54:02', '2020-04-10 11:00:31', '2020-04-11 12:39:08', '2020-04-18 09:45:22', '2020-05-05 09:01:17', '2020-05-23 09:54:02', '2020-07-03 11:00:31', '2020-07-03 12:39:08', '2020-07-04 09:45:22', '2020-07-05 09:01:17', '2020-07-06 09:54:02', '2020-07-06 11:00:31'],
                         'Call Duration' : ['0.01178', '0.01736','0.01923','0.00911','0.01007','0.01206','0.01256','0.01006','0.01162','0.00733','0.01250','0.01013','0.01308'],
                         'ITT': ['NO','YES', 'NO', 'Follow up', 'YES','YES', 'NO', 'Follow up','YES','YES', 'NO','YES','YES']
                        }
            
df = pd.DataFrame(employees)

Desired output:

Name  Dept  Team     Start      End      Weeks Total Calls  Ave. Call time  Sold  Rejected  more info
Mark   21    2    2020-02-19 2020-07-06  19.71      13          0.01207       7       4        2

The logic I am seeking to apply is (although I'm guessing there are errors in the syntax I have written below, I hope that you are still able to understand the calculations):

Start = min date in df['Log']
End = max date in df['Log']
Weeks = (max date in df['log'] - min date in df['Log'])/7
Total Calls = df['Log'].count
Ave. Call time = (df['Call Duration'].sum)/(df['Log'].count)
Sold = (df['ITT']=='YES').count
Rejected = (df['ITT']=='NO').count
more info = (df['ITT']=='Follow up').count

Original Q&A

There are 2 answers

Vishal Kamlapure On 07 October 2020 at 19:55

U had syntax error, where you forgot to put commas at the end of each key. And now u can work on this dataframe.

import pandas as pd
    employees = {'Name=': ['Mark','Mark','Mark','Mark','Mark','Mark', 'Mark','Mark','Mark','Mark','Mark','Mark','Mark'],
                             'Department': ['21','21','21','21','21','21', '21','21','21','21','21','21','21'],
                             'Team': ['2','2','2','2','2','2','2','2','2','2','2','2','2'],
                             'Log': ['2020-02-19 09:01:17', '2020-02-19 09:54:02', '2020-04-10 11:00:31', '2020-04-11 12:39:08', '2020-04-18 09:45:22', '2020-05-05 09:01:17', '2020-05-23 09:54:02', '2020-07-03 11:00:31', '2020-07-03 12:39:08', '2020-07-04 09:45:22', '2020-07-05 09:01:17', '2020-07-06 09:54:02', '2020-07-06 11:00:31'],
                             'Call Duration' : ['0.01178', '0.01736','0.01923','0.00911','0.01007','0.01206','0.01256','0.01006','0.01162','0.00733','0.01250','0.01013','0.01308'],
                             'ITT': ['NO','YES', 'NO', 'Follow up', 'YES','YES', 'NO', 'Follow up','YES','YES', 'NO','YES','YES']
                            }
                
    df = pd.DataFrame(employees)
    print(df)

Output:-

              Name  Department  ... Call Duration        ITT
              Mark         21  ...       0.01178         NO
              Mark         21  ...       0.01736        YES
              Mark         21  ...       0.01923         NO
              Mark         21  ...       0.00911  Follow up
              Mark         21  ...       0.01007        YES
              Mark         21  ...       0.01206        YES
              Mark         21  ...       0.01256         NO
              Mark         21  ...       0.01006  Follow up
              Mark         21  ...       0.01162        YES
              Mark         21  ...       0.00733        YES
              Mark         21  ...       0.01250         NO
              Mark         21  ...       0.01013        YES
              Mark         21  ...       0.01308        YES

[13 rows x 6 columns]

**Scott Boston** · Accepted Answer · 2020-10-07T20:10:43+00:00

Try this using pd.NamedAgg with groupby:

df['Log'] = pd.to_datetime(df['Log'])
df['Call Duration'] = df['Call Duration'].astype(float)

df.groupby(['Name of Employee', 'Team', 'Department'])\
  .agg(Start = ('Log','min'),
       End = ('Log', 'max'),
        Weeks = ('Log', lambda x: np.ptp(x) / np.timedelta64(7, 'D')),
        Total_Calls = ('Log', 'count'),
        Avg_Call_Time = ('Call Duration', 'mean'),
        Sold = ('ITT', lambda x: (x == 'YES').sum()),
        Rejected = ('ITT', lambda x: (x == 'NO').sum()),
        More_info = ('ITT', lambda x: (x=='Follow up').sum()))

Output:

                                               Start                 End      Weeks  Total_Calls  Avg_Call_Time  Sold  Rejected  More_info
Name of Employee Team Department                                                                                                          
Mark             2    21         2020-02-19 09:01:17 2020-07-06 11:00:31  19.726114           13       0.012068     7         4          2

TechQA.

Summarising pandas dataframe in single row

There are 2 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in ROW

Related Questions in SUMMARIZE

Popular Questions

Popular Tags

Trending Questions