Pandas Python - Create subplots from 2 CSV columns

234 views Asked by At

I am trying to create subplots: first Pie plot (got it), second bar plot (didn't succeed):

These are the columns:

data screenshot

My Code:

top_series = all_data.head(50).groupby('Top Rated ')['Top Rated '].count()
top_values = top_series.values.tolist()
top_index = ['Top Rated', 'Not Top Rated']
top_colors = ['#27AE60', '#E74C3C']

all_data['Rating_Cat'] = all_data['Rating'].apply(lambda x : 'High' if (x > 10000000 ) else 'Low')

rating_series = all_data.head(50).groupby('Rating_Cat')['Rating_Cat'].count()
rating_values = rating_series.values.tolist()
rating_index = ['High' , 'Low']
rating_colors = ['#F1C40F', '#27AE60']

fig, axs = plt.subplots(1,2, figsize=(16,5))
axs[0].pie(top_values, labels=top_index, autopct='%1.1f%%', shadow=True, startangle=90,
           explode=(0.05, 0.05), radius=1.2, colors=top_colors, textprops={'fontsize':12})

all_data['Rating_Cat'].value_counts().plot(kind = 'bar', ax=axs[1])
fig.suptitle('Does "Rating" really affect on Top Sellers ?' , fontsize=17)

My question:
How to create the second plot that will get output like:

axis X = 1 , 2 , 3 , 4 .... 50 + Top reated / NO (according to the current col)
axis y = the rating from 0 to 7603388.0

I have really tried lots of things, but I am kind of lost here... please help !!

1

There are 1 answers

5
Daniel Wlazło On

In first plot you are taking first 50 rows of the dataset and plot shares of each value in Top Rated column.

If I understand what you are trying to do in second plot (You want to have each of the Rating from first 100 values plotted from first to last with color based on the Top rated):

#taking first 100 rows
rating_series = all_data.head(100).copy()
#assigning color to the values, so you could use it in bar() plot
rating_series["color"] = rating_series["Top Rated "].map({"Top Rated": "#27AE60", "No": "#E74C3C"})
#plotting the values
axs[1].bar(rating_series.index, rating_series["Rating"], color = rating_series["color"])

If you want to add legend to the plot, you have to do it manually

import matplotlib.patches as mpatches
axs[1].legend(handles=[mpatches.Patch(color='#27AE60', label='Top Rated'),
               mpatches.Patch(color='#E74C3C', label='Not Top Rated')])

Edit: My whole code

import pandas as pd
import numpy as np
import matplotlib.patches as mpatches
import random
df = pd.DataFrame(
    {
        "Rating": np.random.randint(0,7603388,size=200),
        "Top Rated ": [random.choice(['Top Rated', 'No']) for rated in range(0,200)]
    }
)

#taking first 100 rows
rating_series = df.head(100).copy()
#assigning color to the values, so you could use it in bar() plot
rating_series["color"] = rating_series["Top Rated "].map({"Top Rated": "#27AE60", "No": "#E74C3C"})
#checking if there were no NaNs
rating_series["color"].value_counts(dropna=False)

#Output:

#E74C3C    53
#FFC300    47
#Name: color, dtype: int64

#1st plot
top_series = rating_series.groupby('Top Rated ')['Top Rated '].count()
top_index = ['Top Rated', 'Not Top Rated']
top_colors = ['#27AE60', '#E74C3C']

fig, axs = plt.subplots(1,2, figsize=(16,5))
axs[0].pie(top_series.values, labels=top_index, autopct='%1.1f%%', shadow=True, startangle=90,
           explode=(0.05, 0.05), radius=1.2, colors=top_colors, textprops={'fontsize':12})

#2nd plot
axs[1].bar(rating_series.index, rating_series["Rating"], color = rating_series["color"])
axs[1].legend(handles=[mpatches.Patch(color='#27AE60', label='Top Rated'),
               mpatches.Patch(color='#E74C3C', label='Not Top Rated')])

output