Color gradient and shading based on multiple variables in ggplot2 (r)

839 views Asked by At

I was wondering how to add more advanced levels of color scheming with ggplot2 without having to set it up manually.

Heres a pic of what I have so far: current graph

I would like to make it such that all the data follows a gradient, but each Actinobacteria has different shadings of a color, same with Firmicutes, and so on. I can assign a variable to my data frame that represents the taxonomical group that it is in, but how would I use this or some other method to make the color changes that I am looking for?

Heres how the data (prior to melting) looks: data that gets plotted

The sample column is irrelavent. The nseqs is also irrelevent. The group column is used to plot to the proper grid, in this case based on race. The bacteria types are also columns. Each row contains a proportion/percentage of bacteria type. Thus the sum of each rows bacteria type proportions adds up to 100. The xorder is used to order the bars in their respective positions within the grid, and is predetermined.

Again, I want the geom bars to be colored not only by their fill=variable, but also the type that each variable is. All the actinobacteria with similar shading and so on.

This is the current segment of code that deals with plotting:

fdata$nseqs <- factor(fdata$nseqs)
fdata$xorder <- factor(fdata$xorder)
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample', 'xorder'))
p <- ggplot(ggfdata, aes(x=xorder, y=value, fill = variable)) + geom_bar(stat='identity') + facet_grid(~group, scales='free_x', space='free_x') + scale_y_continuous() + labs(title=paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i])) + ylab('Percentage') + xlab('(sorted within group by increasing number of sequences)') + theme(axis.ticks.x = element_blank(), axis.text.x = element_blank()) + scale_x_discrete() + scale_color_brewer()
        print(p)


SIMPLIFIED RAW DATASET (ONLY 3 SAMPLES IE BARS) (image of data in excel table) enter image description here

Here is the dput() of the melted raw simplified data:

samplesstructure(list(group = c("af", "as", "af", "af", "as", "af", 
"af", "as", "af", "af", "as", "af", "af", "as", "af", "af", "as", 
"af", "af", "as", "af", "af", "as", "af", "af", "as", "af", "af", 
"as", "af"), nseqs = structure(c(1L, 1L, 2L, 1L, 1L, 2L, 1L, 
1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 
2L, 1L, 1L, 2L, 1L, 1L, 2L), class = "factor", .Label = c("1", 
"2")), sample = c("Abidjan.534R", "Tanger.534R", "Salvador.534R", 
"Abidjan.534R", "Tanger.534R", "Salvador.534R", "Abidjan.534R", 
"Tanger.534R", "Salvador.534R", "Abidjan.534R", "Tanger.534R", 
"Salvador.534R", "Abidjan.534R", "Tanger.534R", "Salvador.534R", 
"Abidjan.534R", "Tanger.534R", "Salvador.534R", "Abidjan.534R", 
"Tanger.534R", "Salvador.534R", "Abidjan.534R", "Tanger.534R", 
"Salvador.534R", "Abidjan.534R", "Tanger.534R", "Salvador.534R", 
"Abidjan.534R", "Tanger.534R", "Salvador.534R"), xorder = structure(c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), class = "factor", .Label = c("1", 
"2", "3")), variable = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 
8L, 9L, 9L, 9L, 10L, 10L, 10L), .Label = c("Actinobacteria (Propionibacterium)", 
"Actinobacteria (other)", "Actinobacteria (Corynebacterium)", 
"Actinobacteria (Micrococcaceae)", "Firmicutes (Streptococcus)", 
"Firmicutes (other)", "Firmicutes (Staphylococcus)", "Firmicutes (Clostridiales_Incertae_Sedis_XI)", 
"Proteobacteria", "other"), class = "factor"), value = c(28.4151076597247, 
14.3313054752169, 68.823080833757, 25.5794799388163, 19.8663608257704, 
6.17691916624301, 28.9210495352394, 51.4411090056847, 6.72343670564311, 
0.176491351923756, 0.1196768724444, 4.28317234367056, 1.00011766090128, 
0.3889498354443, 2.99949161159126, 1.69431697846806, 2.4434028124065, 
4.30859176410778, 2.05906577244382, 1.1169841428144, 0.699034062023386, 
9.80115307683257, 7.28034307370101, 0.59735638027453, 1.6825508883398, 
2.3237259399621, 3.34265378749364, 0.670667137310272, 0.688142016555301, 
2.04626334519573)), .Names = c("group", "nseqs", "sample", "xorder", 
"variable", "value"), row.names = c(NA, -30L), class = "data.frame")
0

There are 0 answers