How do I work up my pandas dataframe to vizualize it as a stacked barchart with bokeh?

26 views Asked by At

I would like to create a stacked bar chart from this data frame where the x axis is each unique date and the stacked bars are the values are drawn from each numerical value under the provider column.

Grouped data frame

When I create a pivot table, the data aggregates for the columns that have the same exact name. If I pivot with the 'provider' as the new columns, then this makes 5 columns and 14 rows. The issue is bokeh vbar_stack does not accept different columns and rows. There must be the same number of columns and rows. However, I cannot get the pivot table made without the data aggregating.

can I transform this data and use the bokeh package to create a stacked bar chart?

Code:

pivot_df = grouped_df.pivot_table(index=['date'], columns='provider', values='num_youths', aggfunc='first', fill_value=0)

pivot_df.reset_index(inplace=True)

source = ColumnDataSource(pivot_df)

providers = pivot_df.columns[1:]

# Create the figure
p = figure(x_range=pivot_df['date'].unique(), plot_height=350, title="Number of Youths Funded by Provider Each Month",
           toolbar_location=None, tools="")

# Add stacked bars to the figure
p.vbar_stack(stackers=providers, x='date', width=0.9, color=["blue", "red"], source=source,
             legend_label=providers)

Error message: ValueError: Keyword argument sequences for broadcasting must be the same length as stackers

1

There are 1 answers

1
mosc9575 On BEST ANSWER

You have to handle you pandas DataFrame in the correct way.

Pandas

In the example below is a minimal example of your data. I use groupby and unstack with a filling mode to add zeros if a not all groups have a value on each date.

Afterwards I drop the mulit-index of the returned DataFrame.

import pandas as pd

df = pd.DataFrame({
    'date': ['Aug 23', 'Aug 23', 'Dec 23'],
    'provider': ['A', 'B', 'C'],
    'num_youths': [1, 3, 4]
    }
)
df

>>> df
     date provider  num_youths
0  Aug 23        A           1
1  Aug 23        B           3
2  Dec 23        C           4

# groupby and fill with zeor
stacked = df.groupby(['date','provider']).sum().unstack(fill_value=0)
>>> stacked 
         num_youths      
provider          A  B  C
date                     
Aug 23            1  3  0
Dec 23            0  0  4

# drop multi index for columns and index
stacked.columns = stacked.columns.droplevel()
provider = list(stacked.columns)
stacked = stacked.reset_index()

To get the data bokeh wants, you have to call to_dict with orient="list".

data = stacked.to_dict(orient='list')

bokeh

The data has the corect format, so just call figure() and vbar_stack. The most of this code comes from the stacked bar example from the docs.

from bokeh.plotting import figure, show, output_notebook
from bokeh.palettes import HighContrast3
output_notebook()

p = figure(x_range=data['date'], height=250, 
           toolbar_location=None, tools="hover", tooltips="@date $name @$name")

p.vbar_stack(provider, x='date', width=0.9, color=HighContrast3, source=data,
             legend_label=provider)

show(p)

stacked bar plot