Subsetting columns and counting the 1's (TURF analysis?)

143 views Asked by At

The aim is to count the corresponding 1's in the rows of each subset (>2) of columns:

    0   2   4
0   0   1   0
1   1   1   1
2   1   0   0
3   1   1   0
4   1   0   0
... ... ... ...

In above example we would have 4 subsets. Then the idea is to summarize these counts in a bar plot where each bar is labelled according the subset.

The aim is to make an UpSet plot

UpSet plot

1

There are 1 answers

16
mozway On BEST ANSWER

It looks like you're looking for an UpSetPlot:

# pip install upsetplot
import upsetplot

upsetplot.plot(df.astype(bool).value_counts())

Output:

enter image description here

With all combinations

upsetplot.plot(df.astype(bool).value_counts()
                 .reindex(product([True, False], repeat=3), fill_value=0)
              )

enter image description here

older answer

It looks like you might want something like:

df.value_counts().plot.bar()

Output:

enter image description here

Or, by column name for 1 values:

(df.reset_index().melt('index', var_name='cols')
   .query('value == 1')
   .groupby('index')['cols'].agg(frozenset)
   .value_counts().plot.bar()
)

Output:

enter image description here