I have two df (df1 and df2) as below:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
np.random.seed(12)
# Generate first df
col_a1 = [i for i in range(18)] * 15
col_b1 = np.random.randint(1, 100, len(col_a))
configs = ['c1', 'c2', 'c3', 'c4']
col_c1 = [configs[i//90] + '_' + f'abctrct{i//18}' for i in range(len(col_a))]
df1 = pd.DataFrame({'A': col_a1, 'B': col_b1, 'C': col_c1})
# Generate second df
col_d2 = [s + '-' +f'{np.random.randint(1,18)}' for s in [x for x in list(set(col_c1))]]
df2 = pd.DataFrame({'D': col_d2, 'E': np.random.randint(100, 200, len(col_d2))})
I am plotting a box plot for the data as below:
df1[['Hue', 'X']] = df1['C'].str.split('_', expand=True)
fig, ax = plt.subplots()
sns.stripplot(y='B', x='X', data=df1, hue='Hue')
sns.boxplot(y='B', x='X', data=df1, hue='Hue')
plt.xticks(rotation=45, ha='right')
On top of each box plot, I want to add two text (with box boundary). The first text is the value of df1['B']
where df1['A'] == df2['D'].str.split('-').str[1]
. In other words, the numeric part of df2['D']
matches with df1['A']
The value for the second textbox, will come from the df1['A']
that matches the condition. My approach is the following
mean = [i for i in (df1.groupby(['X'], sort=False)['B'])]
df2[['S', 'num']] = df2['D'].str.split('-', expand=True)
idx = [i for i in range(len(mean))]
number = [int(i) for i in df2['num']]
values = [mean[i][1][18*i+j] for i, j in zip(idx, number)]
for xtick in ax.get_xticks():
ax.text(xtick, mean[xtick] , f'bend = {values[xtick]}',
horizontalalignment='center', verticalalignment='center', rotation=90, fontsize=20, bbox={
'facecolor': 'green', 'alpha': 0.5, 'pad': 10})
but I am getting error
ConversionError: Failed to convert value(s) to axis units: ('abctrct0', 0 76
1 28
There is something wrong in
mean[xtick]
.mean
is a list of tuples. Maybe you want to usemean[xtick][1].max()
to determine the top of your boxplot?Output: