how to add text to a box plot by matching and extracting the data from two dataframes

69 views Asked by At

I have two df (df1 and df2) as below:

import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt 


np.random.seed(12)

# Generate first df
col_a1 = [i for i in range(18)] * 15  
col_b1 = np.random.randint(1, 100, len(col_a))
configs = ['c1', 'c2', 'c3', 'c4']
col_c1 = [configs[i//90] + '_' + f'abctrct{i//18}' for i in range(len(col_a))]
df1 = pd.DataFrame({'A': col_a1, 'B': col_b1, 'C': col_c1})

# Generate second df
col_d2 = [s + '-' +f'{np.random.randint(1,18)}' for s in [x for x in list(set(col_c1))]]
df2 = pd.DataFrame({'D': col_d2, 'E': np.random.randint(100, 200, len(col_d2))})

I am plotting a box plot for the data as below:

df1[['Hue', 'X']] = df1['C'].str.split('_', expand=True)

fig, ax  = plt.subplots()
sns.stripplot(y='B', x='X', data=df1, hue='Hue')
sns.boxplot(y='B', x='X', data=df1, hue='Hue')
plt.xticks(rotation=45, ha='right')

On top of each box plot, I want to add two text (with box boundary). The first text is the value of df1['B'] where df1['A'] == df2['D'].str.split('-').str[1]. In other words, the numeric part of df2['D'] matches with df1['A'] The value for the second textbox, will come from the df1['A'] that matches the condition. My approach is the following

mean = [i for i in (df1.groupby(['X'], sort=False)['B'])]
df2[['S', 'num']] = df2['D'].str.split('-', expand=True)
idx = [i for i in range(len(mean))]
number = [int(i) for i in df2['num']]
values = [mean[i][1][18*i+j] for i, j in zip(idx, number)]

for xtick in ax.get_xticks():
    ax.text(xtick, mean[xtick] , f'bend = {values[xtick]}', 
        horizontalalignment='center', verticalalignment='center', rotation=90, fontsize=20, bbox={
'facecolor': 'green', 'alpha': 0.5, 'pad': 10})

but I am getting error

ConversionError: Failed to convert value(s) to axis units: ('abctrct0', 0     76
1     28
1

There are 1 answers

1
Corralien On

There is something wrong in mean[xtick]. mean is a list of tuples. Maybe you want to use mean[xtick][1].max() to determine the top of your boxplot?

fig, ax  = plt.subplots()
sns.stripplot(y='B', x='X', data=df1, hue='Hue', ax=ax)  # use ax=ax
sns.boxplot(y='B', x='X', data=df1, hue='Hue', ax=ax)  # use ax=ax
plt.xticks(rotation=45, ha='right')

...

for xtick in ax.get_xticks():
    ax.text(xtick, mean[xtick][1].max(), f'bend = {values[xtick]}',
            horizontalalignment='center', verticalalignment='center', 
            rotation=90, fontsize=20,
            bbox={'facecolor': 'green', 'alpha': 0.5, 'pad': 10})

Output:

enter image description here