I am trying to plot a bar graph which highlights only the top 10 areas in Auckland district by the money spent on gambling. I have written the code to filter for the top 10 areas and also plot a bar plot in Seaborn.
The issue is that the x-axis is crowded with labels of every area in Auckland district from the dataframe. I only want the labels for the top 10 areas to show up. Will appreciate any help from the kind folks out here.
This is a snapshot of the dataframe I am using:
Date,AU2017_code,crime,n,Pop,AU_GMP_PER_CAPITA,Dep_Index,AU2017_name,TA2018_name,TALB
2018-02-01,500100.0,Abduction,0.0,401.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-03-01,500100.0,Abduction,0.0,402.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-04-01,500100.0,Abduction,0.0,408.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-05-01,500100.0,Abduction,0.0,409.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-06-01,500100.0,Abduction,0.0,410.0,28.890063,10.0,Awanui,Far North District,Far North District
The complete dataframe is availiable as a .csv file here: https://github.com/yyshastri/NZ-Police-Community-Dataset.git
The code for the creation of the bar plot is as follows:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
# Extract the year from the Date column and create a new 'Year' column
merged_data['Year'] = merged_data.index.year
# Filter data for areas that come under Auckland in the TA2018_name column
auckland_data = merged_data[merged_data['TA2018_name'] == 'Auckland']
# Calculate the average AU_GMP_PER_CAPITA for each area within Auckland
avg_gmp_per_area = auckland_data.groupby('AU2017_name')['AU_GMP_PER_CAPITA'].mean()
# Select the top 10 areas by AU_GMP_PER_CAPITA within Auckland
top_10_areas = avg_gmp_per_area.nlargest(10).index
# Further filter the auckland_data to include only the top 10 areas
filtered_data = auckland_data[auckland_data['AU2017_name'].isin(top_10_areas)]
# Use seaborn to create the barplot
sns.barplot(x='AU2017_name', y='AU_GMP_PER_CAPITA', hue='Year', data=filtered_data)
plt.title('The top 10 areas for gambling spend in Auckland')
plt.xticks(rotation=60)
plt.legend(title='Year', loc='upper right')
plt.figure(figsize = (20, 10))
plt.show()
The chart this code is generating has a garbled x-axis,as every area name in Auckland district being populated in the labels.
matplotlib
as the default plotting backend.pandas.DataFrame.plot
, and avoid the extra import and dataframe reshaping..pivot_table
is used to reshape the dataframe and aggregate multiple values with'mean'
.kind='barh'
, horizontal bars, looks cleaner than usingkind='bar'
.python 3.12.0
,pandas 2.1.1
,matplotlib 3.8.0
,seaborn 0.13.0
seaborn
requires convertingtop10
from wide, to long-form, withpandas.DataFrame.melt
.sns.catplot
withkind='bar'
is used, but the axes-level functionsns.barplot
will also work.Data Views
df
auckland_data
looks the same asdf
except it's a subsetdfp.iloc[:, :10]
top10
top10m.head()