How to display intersection values instead of distinct values in Upset plot

Question

How to display intersection values instead of distinct values in Upset plot

727 views Asked by Yash At 11 May 2023 at 17:21

I tried to create an upset plot and display intersection among different sets.
But my upset plot is displaying dinstinct value counts among sets.
How do I change it to intersections instead of distinct counts?

This is my code:

mammals = ['Cat', 'Dog', 'Horse', 'Sheep', 'Pig', 'Cattle', 'Rhinoceros', 'Moose']
herbivores = ['Horse', 'Sheep', 'Cattle', 'Moose', 'Rhinoceros']
domesticated = ['Dog', 'Chicken', 'Horse', 'Sheep', 'Pig', 'Cattle', 'Duck']
from upsetplot import from_contents
animals = from_contents({'mammal': mammals, 'herbivore': herbivores, 'domesticated': domesticated})
from upsetplot import UpSet
ax_dict = UpSet(animals, subset_size='count',show_counts=True).plot()

This is my output:

The actual intersection between herbivores and mammals is 5 while my plot shows 2.
Can anyone help me how to show intersections in upset plots?

Original Q&A

There are 1 answers

**Christian Groß** · Accepted Answer · 2023-06-01T21:30:56+00:00

Okay this question is already some days old but I have not seen any answer yet.

A couple of years ago I faced a similar problem and I found some old code of mine. The idea is that you manually calculate the intersection size and then create an input object via upsetplot.from_memberships() containing the categories and their associated intersections sizes.

In your case try something similar to this here:

import upsetplot
import itertools
import numpy as np

mammals = ['Cat', 'Dog', 'Horse', 'Sheep', 'Pig', 'Cattle', 'Rhinoceros', 'Moose']
herbivores = ['Horse', 'Sheep', 'Cattle', 'Moose', 'Rhinoceros']
domesticated = ['Dog', 'Chicken', 'Horse', 'Sheep', 'Pig', 'Cattle', 'Duck']

animals_dict = {"mammals": mammals, "herbivores": herbivores, "domesticated": domesticated}

categories = list(animals_dict.keys())
comb_list_list = []
comb_intersection_length_list = []
# identify per category combination the intersection length
for i in range(len(categories)):
    comb_list = list(itertools.combinations(categories, i+1))
    for elem in comb_list:
        comb_list_list.append(elem)
        # create a list of lists of categories for which to search the intersection length
        cat_lists = [animals_dict[x] for x in elem]
        comb_intersection_length_list.append(len(set(cat_lists[0]).intersection(*cat_lists)))

# remove category combinations with 0 intersections.
comb_list_list = np.array(comb_list_list)
comb_intersection_length_list = np.array(comb_intersection_length_list)
comb_list_list = comb_list_list[comb_intersection_length_list != 0]
comb_intersection_length_list = comb_intersection_length_list[comb_intersection_length_list != 0]

# create a membership data series which indicates the intersection size between the different sets
mem_series = upsetplot.from_memberships(comb_list_list,
                                        data=comb_intersection_length_list)

upsetplot.plot(mem_series,
               orientation='horizontal',
               show_counts=True)

The problem with this approach is that the total set size (bottom left) inflates as it is the sum over all intersections rather all distinct values, thus is not really useful anymore. For my own purpose, this approach was good enough, any adjustments need to be done by yourself.

Here is the plot showing intersection sizes:

Upsetplot showing intersection sizes.

TechQA.

How to display intersection values instead of distinct values in Upset plot

There are 1 answers

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in VISUALIZATION

Related Questions in UPSETPLOT

Popular Questions

Popular Tags

Trending Questions