Plotting a CDF from a multiclass pandas dataframe

265 views Asked by At

I understand the package empiricaldist provides a CDF function as per the documentation.

However, I find it tricky to plot my dataframe in the column has multiple values.

df.head()
    +------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+
    |      | trip_id | seconds_start | seconds_end | duration | distance | speed | acceleration | lat_start | lon_start |  lat_end  |  lon_end  | travelmode |
    +------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+
    | 0    |  318410 |    1461743310 |  1461745298 |     1988 | 5121.49  | 2.58  | 0.00130      | 41.162687 | -8.615425 | 41.177888 | -8.597549 | car        |
    | 1    |  318411 |    1461749359 |  1461750290 |      931 | 1520.71  | 1.63  | 0.00175      | 41.177949 | -8.597074 | 41.177839 | -8.597574 | bus        |
    | 2    |  318421 |    1461806871 |  1461806941 |       70 | 508.15   | 7.26  | 0.10370      | 37.091240 | -8.211239 | 37.092322 | -8.206681 | foot       |
    | 3    |  318422 |    1461837354 |  1461838024 |      670 | 1207.39  | 1.80  | 0.00269      | 37.092082 | -8.205060 | 37.091659 | -8.206462 | car        |
    | 4    |  318425 |    1461852790 |  1461853845 |     1055 | 1470.49  | 1.39  | 0.00132      | 37.091628 | -8.202143 | 37.092095 | -8.205070 | foot       |
    +------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+

Would like to plot CDF for the column travelmode for each travel mode.

groups = df.groupby('travelmode')

However, I don't really understand how this could be done from the documentation.

1

There are 1 answers

0
m13op22 On BEST ANSWER

You can plot them in a loop like

import matplotlib.pyplot as plt

def decorate_plot(title):
    ''' Adds labels to plot '''
    plt.xlabel('Outcome')
    plt.ylabel('CDF')
    plt.title(title)

for tm in df['travelmode'].unique():
    for col in df.columns:
        if col != 'travelmode':
            # Create new figures for each plot
            fig, ax = plt.subplots()
            d4 = Cdf.from_seq(df[col])
            d4.plot()
            decorate_plot(f"{tm} - {col}")