Complicated Stacked-barplot in R using color column

692 views Asked by At

I am trying to make a stacked barplot using R. The main sticking point is using the colors from the color column in the plot appropriately.

Requirements of the plot:

  • Each bar(x axis) should represent a time.
  • Each species should be its appropriate color (given by the color column) with its space on the barplot reflecting abundance(y axis).
  • Within each bar, the species in the same phyla should be grouped together.
  • Setting the width of the bars would be really cool, but not necessary.

Characteristics of the dataset:

  • Each species has an individual color and the colors of the species are gradiented by their phyla.
  • The abundances of species within a time sum to 100.
  • Not every species is in every time
  • There are 7 times, 8 phyla, 132 species

Other ideas on how to represent these data are welcome.

Representative data:

phyla           species                         abundance    color    time
Actinobacteria  Bifidobacterium_adolescentis    18.73529    #F7FBFF   D30
Firmicutes      Faecalibacterium_prausnitzii    14.118      #F7FCF5   D30
Firmicutes      Catenibacterium_mitsuokai       12.51944    #F3F9F2   D30
Bacteroidetes   Bacteroides_ovatus              7.52241     #FFF5EB   D30
Firmicutes      Faecalibacterium_prausnitzii    21.11866    #F7FCF5   D7
Firmicutes      Ruminococcus_sp_5_1_39BFAA      13.54397    #92B09C   D7
Actinobacteria  Bifidobacterium_adolescentis    10.21989    #F7FBFF   D7
Actinobacteria  Bifidobacterium_adolescentis    38.17028    #F7FBFF   D90
Firmicutes      Catenibacterium_mitsuokai       11.04982    #F3F9F2   D90
Firmicutes      Faecalibacterium_prausnitzii    9.82507     #F7FCF5   D90
Actinobacteria  Collinsella_aerofaciens         5.2334      #D4DEE9   D90

Thank you in advance; I am banging my head against the wall with this.

Code thanks to Robert.

#reshape the dataframes as matrices
#species are row names and times are columns (abundance data makes up matrix)
#put the matrix times in the correct order
#create stacked barplot that has the width of column reflecting shannon index
#save the stacked barplots in files named by the entry list
for(i in 1:n){
  phyl=aggregate(abundance ~ phyla+species+color+time, dfs[[i]], sum)
  phyl=phyl[with(phyl,order(phyla,species,time)),]
  wide <- reshape(phyl, idvar = c("phyla","species","color"),
                  timevar = "time", direction = "wide")
  wide[is.na(wide)]<-0
  wide

  res1=as.matrix(wide[,-c(1:3)],ncol=dim(wide[,-c(1:3)])[2])
   colnames(res1)=
    unlist(strsplit(colnames(res1), ".", fixed = TRUE)) [seq(2,length(colnames(res1))*2,by=2)]
  rownames(res1)=wide$species
  res1 <- res1[,c('E','FMT','PA','PF','D7','D30','D90')]

  bar.width <- as.matrix(div.dfs[[i]]['frac'])

   mypath <- file.path(output.path,paste(project.name, "_", lhs[i], ".tiff", sep = ""))
  tiff(file=mypath)
  mytitle = paste(project.name, lhs[i])
  barplot(res1,col=wide$color,beside = F, width = c(bar.width), main = mytitle, legend.text=F,args.legend=
            list(x = "top",bty="n",cex=.6,ncol=2))
  dev.off()

  rm(res1)
}

#makes the legend and exports as a eps file
setwd(output.path)
plot_colors <- database$color
text <- database$species
SetEPS()
postscript('legend.eps')
plot.new()
par(xpd=TRUE)
legend("center",legend = text, text.width = max(sapply(text, strwidth)),
       col=plot_colors, lwd=1, cex=.2, horiz = F, ncol=2, bty='n')
par(xpd=FALSE)
dev.off()
1

There are 1 answers

2
Robert On

This is without phyla

cols=sapply(unique(dat$species),function(sp)unique(dat$color[dat$species==sp]))
res=tapply(dat$abundance, list(species = dat$species, time = dat$time), sum)
res[is.na(res)]<-0
barplot(res,col=cols,beside = F,legend.text=T,args.legend=
          list(x = "top",bty="n",cex=.6,ncol=2))

This is the approach considering phyla

phyl=aggregate(abundance ~ phyla+species+color+time, dat, sum)
phyl=phyl[with(phyl,order(phyla,species,time)),]
wide <- reshape(phyl, idvar = c("phyla","species","color"),
          timevar = "time", direction = "wide")
wide[is.na(wide)]<-0
wide

res1=as.matrix(wide[,-c(1:3)],ncol=dim(wide[,-c(1:3)])[2])
colnames(res1)=
unlist(strsplit(colnames(res1), ".", fixed = TRUE))[seq(2,length(colnames(res1))*2,by=2)]
rownames(res1)=wide$species

barplot(res1,col=wide$color,beside = F,legend.text=T,args.legend=
          list(x = "top",bty="n",cex=.6,ncol=2))