Barplot grouped by class and time-interval

40 views Asked by At

I have data of request response times in a pandas dataframe

    execution_time  request_type    response_time_ms    URL     Error
2   2023-10-12 08:52:16     Google  91.0    https://www.google.com  NaN
3   2023-10-12 08:52:16     CNN     115.0   https://edition.cnn.com     NaN
6   2023-10-12 08:52:27     Google  90.0    https://www.google.com  NaN
7   2023-10-12 08:52:27     CNN     105.0   https://edition.cnn.com     NaN
10  2023-10-12 08:52:37     Google  5111.0  https://www.google.com  NaN

It contains the time of the request, request_type is simply the website name and the response time.

What I want to achieve is a barplot that groups the median response time by website (request_type) and by a time frame, say group every 4 hrs together. This should show that response time varies by daytime.

I managed to create the plot but the coloring is "off". The issue I have is that I want the different websites to be colored differently.

What I have till now:

df_by_time = df.groupby(["request_type", pd.Grouper(key="execution_time", freq="4h")]).agg({"response_time_ms": ["median"]})
df_by_time.plot(kind='bar', figsize=(8, 6), title='Response Times', xlabel='Type', ylabel='Response time [ms]', rot=90) 

This leads to below image:

Response Times by hour

I would like to:

  • group the times together so each time only appears once with a stack in different color per website
  • or at least in this plot have the different websites in different colors
  • get rid of the "none, none" in the legend

How can I achieve that?

1

There are 1 answers

0
mozway On BEST ANSWER

If I understand correctly, you need to aggregate with 'median', not ['median'] to avoid the MultiIndex, then you can use seaborn.barplot:

import seaborn as sns

df_by_time = (df.groupby(["request_type", pd.Grouper(key="execution_time",
                                                     freq="4h")])
                .agg({"response_time_ms": "median"})
                .reset_index()
             )

sns.barplot(data=df_by_time, x='execution_time', y='response_time_ms',
            hue='request_type')

Alternatively, use groupby.median to produce a Series and unstack to use pandas' plot.bar:

df_by_time = (df.groupby(["request_type", pd.Grouper(key="execution_time", freq="4h")])
                ['response_time_ms'].median()
                .unstack('request_type')
             )

df_by_time.plot.bar()

Output:

enter image description here

Aggregation every 20s to show you the behavior with multiple time groups:

enter image description here