How to bar plot the top n categories for each year

Question

How to bar plot the top n categories for each year

162 views Asked by YShastri At 31 October 2023 at 01:06

I am trying to plot a bar graph which highlights only the top 10 areas in Auckland district by the money spent on gambling. I have written the code to filter for the top 10 areas and also plot a bar plot in Seaborn.

The issue is that the x-axis is crowded with labels of every area in Auckland district from the dataframe. I only want the labels for the top 10 areas to show up. Will appreciate any help from the kind folks out here.

This is a snapshot of the dataframe I am using:

Date,AU2017_code,crime,n,Pop,AU_GMP_PER_CAPITA,Dep_Index,AU2017_name,TA2018_name,TALB
2018-02-01,500100.0,Abduction,0.0,401.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-03-01,500100.0,Abduction,0.0,402.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-04-01,500100.0,Abduction,0.0,408.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-05-01,500100.0,Abduction,0.0,409.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-06-01,500100.0,Abduction,0.0,410.0,28.890063,10.0,Awanui,Far North District,Far North District

The complete dataframe is availiable as a .csv file here: https://github.com/yyshastri/NZ-Police-Community-Dataset.git

The code for the creation of the bar plot is as follows:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns


# Extract the year from the Date column and create a new 'Year' column
merged_data['Year'] = merged_data.index.year

# Filter data for areas that come under Auckland in the TA2018_name column
auckland_data = merged_data[merged_data['TA2018_name'] == 'Auckland']

# Calculate the average AU_GMP_PER_CAPITA for each area within Auckland
avg_gmp_per_area = auckland_data.groupby('AU2017_name')['AU_GMP_PER_CAPITA'].mean()

# Select the top 10 areas by AU_GMP_PER_CAPITA within Auckland
top_10_areas = avg_gmp_per_area.nlargest(10).index

# Further filter the auckland_data to include only the top 10 areas
filtered_data = auckland_data[auckland_data['AU2017_name'].isin(top_10_areas)]

# Use seaborn to create the barplot
sns.barplot(x='AU2017_name', y='AU_GMP_PER_CAPITA', hue='Year', data=filtered_data)

plt.title('The top 10 areas for gambling spend in Auckland')
plt.xticks(rotation=60)
plt.legend(title='Year', loc='upper right')
plt.figure(figsize = (20, 10))
plt.show()

The chart this code is generating has a garbled x-axis,as every area name in Auckland district being populated in the labels.

Original Q&A

There are 1 answers

**Trenton McKinney** · Accepted Answer · 2023-10-31T15:57:04+00:00

seaborn is a high-level API for matplotlib, and pandas uses matplotlib as the default plotting backend.
- In this case, it's more direct to plot with pandas.DataFrame.plot, and avoid the extra import and dataframe reshaping.
.pivot_table is used to reshape the dataframe and aggregate multiple values with 'mean'.
The data for each year must be separately sorted, as the top 10 cities may not be the same for each year.
Given the long city names, using kind='barh', horizontal bars, looks cleaner than using kind='bar'.
Tested in python 3.12.0, pandas 2.1.1, matplotlib 3.8.0, seaborn 0.13.0

import pandas as pd

# read the data from github
df = pd.read_csv('https://raw.githubusercontent.com/yyshastri/NZ-Police-Community-Dataset/main/Merged_Community_Police_Data.xls')

# select Auckland data
auckland_data = df[df['TA2018_name'] == 'Auckland'].copy()

# reshape the data with pivot table and aggregate the mean
dfp = auckland_data.pivot_table(index='Year', columns='AU2017_name', values='AU_GMP_PER_CAPITA', aggfunc='mean')

# for each year find the top 10 cities, and concat them into a single dataframe
top10 = pd.concat([data.sort_values(ascending=False).iloc[:10].to_frame() for _, data in dfp.iterrows()], axis=1)

# since the city names are long, use a horizontal bar (barh), otherwise use kind='bar'
ax = top10.plot(kind='barh', figsize=(5, 8), width=0.8,
                xlabel='Mean GMP PER CAPITA', ylabel='City', title='Yearly Top 10 Cities')

ax = top10.plot(kind='bar', figsize=(20, 6), width=0.8, rot=0,
                ylabel='Mean GMP PER CAPITA', xlabel='City', title='Yearly Top 10 Cities')

Using seaborn requires converting top10 from wide, to long-form, with pandas.DataFrame.melt.
The figure-level function sns.catplot with kind='bar' is used, but the axes-level function sns.barplot will also work.
- Figure-level vs. axes-level functions

import seaborn as sns

# reshape the the dataframe to long form
top10m = top10.melt(var_name='Year', value_name='Mean GMP PER CAPITA', ignore_index=False).reset_index(names=['City'])

# plot
g = sns.catplot(data=top10m, kind='bar', x='City', y='Mean GMP PER CAPITA', hue='Year', height=5, aspect=4, palette='tab10', legend='full')

Data Views

`df`

auckland_data looks the same as df except it's a subset

   AU2017_code      crime  n  Pop  AU_GMP_PER_CAPITA  Dep_Index AU2017_name         TA2018_name                TALB  Year
0       500100  Abduction  0  401          28.890063       10.0      Awanui  Far North District  Far North District  2018
1       500100  Abduction  0  402          28.890063       10.0      Awanui  Far North District  Far North District  2018
2       500100  Abduction  0  408          28.890063       10.0      Awanui  Far North District  Far North District  2018
3       500100  Abduction  0  409          28.890063       10.0      Awanui  Far North District  Far North District  2018
4       500100  Abduction  0  410          28.890063       10.0      Awanui  Far North District  Far North District  2018

`dfp.iloc[:, :10]`

The first 10 columns, otherwise it's to much data to post

AU2017_name  Abbotts Park  Aiguilles Island    Akarana     Albany  Algies Bay     Ambury      Aorere   Arahanga  Arch Hill    Ardmore
Year                                                                                                                                 
2018            41.995023               0.0  48.619904  34.953781    8.989871  57.940325  111.343778  78.498990  58.685772  40.572675
2019            40.569120               0.0  47.898409  34.046811    9.073010  57.053751  112.236632  78.707498  57.905275  38.060297
2020            27.936208               0.0  35.284514  25.236172    6.720755  42.324155   84.505122  57.954157  41.092557  26.683718

`top10`

                        2018        2019        2020
AU2017_name                                         
Matheson Bay      214.762992  224.552738  172.133803
Point Wells       181.298995  188.588469  143.436274
Leigh             168.446421  172.428979  129.395604
Papakura North    128.974569  124.977594   90.942141
Fairburn          128.231566  127.925022   91.885721
Otahuhu West      127.002810  125.271241   90.084230
Otahuhu North     123.810519  123.690082   87.164136
Dingwall          118.963782         NaN   83.436386
Papatoetoe North  118.210508  113.328798         NaN
Puhinui South     116.787094  113.630079   85.114301
Papakura Central         NaN  113.442014         NaN
Aorere                   NaN         NaN   84.505122

`top10m.head()`

             City  Year  Mean GMP PER CAPITA
0    Matheson Bay  2018           214.762992
1     Point Wells  2018           181.298995
2           Leigh  2018           168.446421
3  Papakura North  2018           128.974569
4        Fairburn  2018           128.231566

TechQA.

How to bar plot the top n categories for each year

There are 1 answers

Data Views

`df`

`dfp.iloc[:, :10]`

`top10`

`top10m.head()`

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in SEABORN

Related Questions in BAR-CHART

Related Questions in GROUPED-BAR-CHART

Popular Questions

Popular Tags

Trending Questions