I am trying to generate a grouped boxplot in ggplot2 with two x variables. This is straight-forward with
ggplot(boxplot_classes, aes(x=Group, y=Value, fill=Mutation)) +
geom_boxplot(position=position_dodge(0.8))
However, I do not need to compare the two subgroups defined by the second x-variable, but for each group defined by the first x-variable, I need to compare all samples in this group with one single subgroup from the second x variable.
Here an example. The data looks like this:
Value Mutation Group
32.00 Yes 1
5.00 no 1
18.00 no 1
3.00 no 1
16.00 no 1
14.00 Yes 1
28.00 Yes 1
28.00 Yes 1
49.00 Yes 1
15.00 Yes 1
43.00 no 2
49.00 Yes 2
40.00 Yes 2
17.00 Yes 2
9.00 no 2
31.00 Yes 2
8.00 Yes 2
43.00 no 2
50.00 Yes 2
48.00 Yes 2
11.00 Yes 3
42.00 no 3
0.00 Yes 3
15.00 Yes 3
8.00 no 3
1.00 Yes 3
41.00 no 3
15.00 no 3
4.00 no 3
31.00 Yes 3
I would like to generate a figure, were in each "Group" (in the example above: 1, 2, 3) two boxplots are generated: one for all samples in this "Group" and one only for those samples in this group, which also have mutation=="Yes". In the real data, many more "Groups are present".
I hope I could explain my problem well. Unfortunately I am somehow missing what the correct syntax is or how the data has to be rearranged.
Thank you very much for any help!
EDIT: I uploaded an example of the figure I am trying to generate at https://s28.postimg.org/hvq8pb25p/Folie1.jpg
If we play with your data a bit, we can do it. Suppose your data is in
dat
:First, we create a subset of your data called
dat_yes
, which only contains the rows withMutation == 'Yes'
. We then create a new column indat_yes
calledMutation_2
which takes the value of'Yes'
only. We then add a column to your original data calledMutation_2
which only takes the value of'All'
. Then, werbind
dat
anddat_yes
to createdat_full
. Finally, we senddat_full
toggplot
.data