I have created 4 different upset plots using the ComplexUpset package in R. The 4 plots have different intersection sizes since the length of the data frames range from 300 to 12000. Since, I want to compare these 4 plots, I was hoping to have a same y-axis scale for ease of clarity and discussion. I want to normalize the intersection_size data from 0 to 1.
After reading the Upset and ComplexUpset documentations, I see that the intersections are internally calculated and cannot really be extracted. I see that you still manipulate the intersections like:
'Intersection size'=intersection_size(text_mapping=aes(label=paste0(round(
!!get_size_mode('exclusive_intersection')/!!get_size_mode('inclusive_union') * 100
), '%')))
but I couldn't do a normalization like
'Intersection size'=intersection_size(text_mapping=aes(label=paste0(round(
!!get_size_mode('exclusive_intersection')/max(!!get_size_mode('inclusive_union')))
I saw How to to assign logarithmic scale to “Intersection size” using ComplexUpset library? solution from @krassowski and I'm hoping to do something similar using the geom_bar to maybe normalize instead of a log scale. For example, using the movies dataset to produce the following:
library(ComplexUpset)
library(ggplot2)
movies = as.data.frame(ggplot2movies::movies)
movies[movies$mpaa == '', 'mpaa'] = NA
movies = na.omit(movies)
genres = colnames(movies)[18:24]
plot2 <- upset(movies, genres, base_annotations=list
('Size'=(intersection_size(counts=FALSE))),
min_size=5,
width_ratio=0.1)
Here, instead of the y-axis scale going from 0 to 400, I would want it to go from 0 to 1, so that I can compare 4 similar upset plots.
------ Solved:------
I have done the following to normalize (y = y/max(y))the intersection size:
presence = ComplexUpset:::get_mode_presence('exclusive_intersection')
summarise_values = function(df){
aggregate(
as.formula(paste0(presence, '~intersection')),
df,
FUN = sun
)
}
upset(
movies,
genres,
base_annotations=list(
'log10(intersection size)'=(
ggplot()
+ geom_bar(
data=summarise_values,
stat='identity',
aes(y=!!presence / max(!!presence)))
)
)
),
width_ratio=0.1
)
I think the results make sense as I'm seeing them, but if anyone sees any logical mistake, feel free to leave a comment.