R: Grouped boxplot with 2 X-variables, in each group compare all samples vs. one X2 group

2.4k views Asked by At

I am trying to generate a grouped boxplot in ggplot2 with two x variables. This is straight-forward with

ggplot(boxplot_classes, aes(x=Group, y=Value, fill=Mutation)) + 
geom_boxplot(position=position_dodge(0.8))

However, I do not need to compare the two subgroups defined by the second x-variable, but for each group defined by the first x-variable, I need to compare all samples in this group with one single subgroup from the second x variable.

Here an example. The data looks like this:

Value   Mutation    Group
32.00   Yes 1
5.00    no  1
18.00   no  1
3.00    no  1
16.00   no  1
14.00   Yes 1
28.00   Yes 1
28.00   Yes 1
49.00   Yes 1
15.00   Yes 1
43.00   no  2
49.00   Yes 2
40.00   Yes 2
17.00   Yes 2
9.00    no  2
31.00   Yes 2
8.00    Yes 2
43.00   no  2
50.00   Yes 2
48.00   Yes 2
11.00   Yes 3
42.00   no  3
0.00    Yes 3
15.00   Yes 3
8.00    no  3
1.00    Yes 3
41.00   no  3
15.00   no  3
4.00    no  3
31.00   Yes 3

I would like to generate a figure, were in each "Group" (in the example above: 1, 2, 3) two boxplots are generated: one for all samples in this "Group" and one only for those samples in this group, which also have mutation=="Yes". In the real data, many more "Groups are present".

I hope I could explain my problem well. Unfortunately I am somehow missing what the correct syntax is or how the data has to be rearranged.

Thank you very much for any help!

EDIT: I uploaded an example of the figure I am trying to generate at https://s28.postimg.org/hvq8pb25p/Folie1.jpg

2

There are 2 answers

2
bouncyball On BEST ANSWER

If we play with your data a bit, we can do it. Suppose your data is in dat:

dat_yes <- dat[dat$Mutation == 'Yes',] #subset only Yes
dat_yes$Mutation_2 <- 'Yes' #add column
dat$Mutation_2 <- 'All' #add column

dat_full <- rbind(dat, dat_yes) #put together

#plot
ggplot(dat_full, aes(x = factor(Group), y = Value))+
    geom_boxplot(aes(fill = Mutation_2))+
    xlab('Group') + 
    scale_fill_brewer(palette = 'Set1', name = 'Mutation')

First, we create a subset of your data called dat_yes, which only contains the rows with Mutation == 'Yes'. We then create a new column in dat_yes called Mutation_2 which takes the value of 'Yes' only. We then add a column to your original data called Mutation_2 which only takes the value of 'All'. Then, we rbind dat and dat_yes to create dat_full. Finally, we send dat_full to ggplot.

enter image description here

data

dat <- structure(list(Value = c(32, 5, 18, 3, 16, 14, 28, 28, 49, 15, 
43, 49, 40, 17, 9, 31, 8, 43, 50, 48, 11, 42, 0, 15, 8, 1, 41, 
15, 4, 31), Mutation = c("Yes", "no", "no", "no", "no", "Yes", 
"Yes", "Yes", "Yes", "Yes", "no", "Yes", "Yes", "Yes", "no", 
"Yes", "Yes", "no", "Yes", "Yes", "Yes", "no", "Yes", "Yes", 
"no", "Yes", "no", "no", "no", "Yes"), Group = c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L)), .Names = c("Value", 
"Mutation", "Group"), class = "data.frame", row.names = c(NA, 
-30L))
2
shu251 On

Do you have an example plot that you desire?

You can try a combination of facet_grid() or facet_wrap() and subsetting data to get the Mutation == Yes part.

Try this:

plot_base<- ggplot(boxplot_classes, aes(x=data, y=Value, fill=Mutation)) + geom_boxplot(position=position_dodge(0.8)) + facet_grid(Mutation~Group)

Look at other options for facet_grid and facet_wrap to modify further.

To get Mutation == Yes part:

plot_base %+% subset(boxplot_classes, Mutation %in% "Yes")