UpSetPlot from actual sets

2.9k views Asked by At

I want to use the UpSetPlot given the actual sets I have, but I cannot find any example to use it this way. The standard example is this:

from upsetplot import generate_counts, plot
example = generate_counts()
plot(example, orientation='vertical')

where generated example is a Series looking like below.

cat0   cat1   cat2 
False  False  False      56
              True      283
       True   False    1279
              True     5882
True   False  False      24
              True       90
       True   False     429
              True     1957
Name: value, dtype: int64

Is there a way to automatically generate this kind of count structure from the actual elements in the categories cat0, cat1, and cat2?

3

There are 3 answers

0
user5054 On BEST ANSWER

Using the tip by @StupidWolf in another answer, here is an answer to my own question. Given 3 sets

set1 = {0,1,2,3,4,5}
set2 = {3,4,5,6,10}
set3 = {0,5,6,7,8,9}

here is the code to draw an upsetplot for these three sets:

import pandas as pd
from upsetplot import plot
set_names = ['set1', 'set2', 'set3']
all_elems = set1.union(set2).union(set3)
df = pd.DataFrame([[e in set1, e in set2, e in set3] for e in all_elems], columns = set_names)
df_up = df.groupby(set_names).size()
plot(df_up, orientation='horizontal')

enter image description here

And here is the 4th and 5th line changed to generalize above code to a list of sets, say sets = [set1, set2, set3]:

all_elems = list(set().union(*sets))
df = pd.DataFrame([[e in st for st in sets] for e in all_elems], columns = set_names)
0
StupidWolf On

It looks like a product from pandas to me:

import numpy as np
import pandas as pd

from upsetplot import generate_counts, plot
example = generate_counts()
type(example)

pandas.core.series.Series

example.index

MultiIndex([(False, False, False),
            (False, False,  True),
            (False,  True, False),
            (False,  True,  True),
            ( True, False, False),
            ( True, False,  True),
            ( True,  True, False),
            ( True,  True,  True)],
           names=['cat0', 'cat1', 'cat2'])

So if your dataframe is like this:

df = pd.DataFrame(np.random.choice([True,False],(100,3)),
                  columns=['cat0','cat1','cat2'])

You can do:

example = df.groupby(['cat0','cat1','cat2']).size()
plot(example, orientation='vertical')

enter image description here

I think the limitation is that the elements in cat0, cat1, cat2 have to be boolean.

0
joeln On

There are several ways that sets can be used to represent category membership. To help translate sets into the format required by upsetplot, you will find helpers from_memberships, from_contents and from_indicators.

See also the Data Format Guide.