coloring data points using vector of RGB values for each data point

1.5k views Asked by At

I have a pandas dataframe with some values. I wanted to use seaborn's stripplot to visualize the spread of my data, although this is the first time I'm using seaborn. I thought it would be interesting to color the datapoints that were outliers, so I created a column containing RGB tuples for each value. I have used this approach before and I find it very convenient so I would love to find a way to make this work because seaborn is quite nice.

This is how the dataframe might look:

   SUBJECT  CONDITION(num)       hit  hit_box_outliers  \
0      4.0             1.0  0.807692                 0   
1      4.0             2.0  0.942308                 0   
2      4.0             3.0  1.000000                 0   
3      4.0             4.0  1.000000                 0   
4      5.0             1.0  0.865385                 0   

                                         hit_colours  
0  (0.38823529411764707, 0.38823529411764707, 0.3...  
1  (0.38823529411764707, 0.38823529411764707, 0.3...  
2  (0.38823529411764707, 0.38823529411764707, 0.3...  
3  (0.38823529411764707, 0.38823529411764707, 0.3...  
4  (0.38823529411764707, 0.38823529411764707, 0.3...  

Then I try to plot it here:

sns.stripplot(x='CONDITION(num)', y='hit', data=edfg, jitter=True, color=edfg['hit_colours'])

and I am given the following error:

ValueError: Could not generate a palette for <map object at 0x000002265939FB00>

Any ideas for how I can achieve this seemingly easy task?

1

There are 1 answers

2
ImportanceOfBeingErnest On

It seems you want to distinguish between a point being an outlier or not. There are hence two possible cases, which are determined by the column hit_box_outliers.
You may use this column as the hue for the stripplot. To get a custom color for the two events, use a palette (or list of colors).

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

df= pd.DataFrame({"CONDITION(num)" : np.tile([1,2,3,4],25),
                  "hit" :  np.random.rand(100),
                  "hit_box_outliers": np.random.randint(2, size=100)})


sns.stripplot(x='CONDITION(num)', y='hit', hue ="hit_box_outliers", data=df, jitter=True, 
              palette=("limegreen", (0.4,0,0.8)))

plt.show()

enter image description here