Need specific coloring in ggplot2 with viridis

1.1k views Asked by At

Here's the situation, I am generating complex stacked bar charts with 20+ entries. However, downstream this is often reduced to only 5 or 6 entries. I want to use the colors from this downstream set and carry those back through to the more complex samples.

Essentially I want anything that isn't in the final set to be colored gray. I currently don't know how I can go about doing this.

An additional wrinkle is the downstream data does not necessarily have the same shape as the upstream data. For context, this is a complex set of 16S biological sequencing data as well as pure DNA sequencing and classification.

My current thought is to somehow assign a color directly to a specific value, but I'm not entirely sure how to do this and how to determine which color is being displayed downstream by viridis.

Edit: These sets of data should be somewhat indicative of what I'm after:

First Set

 SampleID Abundance
 A 0.083
 B 0.083
 C 0.083
 D 0.083
 E 0.083
 F 0.083
 G 0.083
 H 0.083
 I 0.083
 J 0.083
 K 0.083
 L 0.083

Downstream Set

SampleID Abundance
A 0.25
E 0.25
I 0.25
J 0.25

In this case I want A, E, I, and J to have a consistent coloring and the other letters to be gray. I would also prefer to have all colored entries stacking together and then leave the gray on top. The other option I guess is to go back and remove all non entries and then add an asterisk saying, "missing regions are not found downstream."

Edit2: A mockup expected output of the original and downstream data

Example output

1

There are 1 answers

1
Jake Kaupp On BEST ANSWER
library(tidyverse)
library(viridis)
#> Loading required package: viridisLite

first <- tribble(~SampleID, ~Abundance,
                 "A", 0.083,
                 "B", 0.083,
                 "C", 0.083,
                 "D", 0.083,
                 "E", 0.083,
                 "F", 0.083,
                 "G", 0.083,
                 "H", 0.083,
                 "I", 0.083,
                 "J", 0.083,
                 "K", 0.083,
                 "L", 0.083) %>% 
  mutate(Class = "First")

downstream <- tribble(~SampleID, ~Abundance,
                      "A", 0.25,
                      "E", 0.25,
                      "I", 0.25,
                      "J", 0.25) %>% 
  mutate(Class = "Downstream")

pal <- viridis(4)

maps <- tibble(labels = LETTERS[1:12],
       colors = case_when(labels == "A" ~ pal[1],
                          labels == "E" ~ pal[2],
                          labels == "I" ~ pal[3],
                          labels == "J" ~ pal[4],
                          TRUE ~ "Grey50")) %>% 
  mutate(order = ifelse(colors == "Grey50", 2, 1)) %>% 
  arrange(order, labels)

values <- set_names(maps$colors, maps$labels)

plot_data <- bind_rows(first, downstream) %>% 
  mutate(SampleID = factor(SampleID, maps$labels),
         Class = factor(Class, c("First","Downstream"))) %>% 
  arrange(Class, SampleID)

ggplot(plot_data, aes(x = Class, y = Abundance, fill = SampleID, group = Class)) +
  geom_col() +
  scale_fill_manual("Legend", values = values, breaks = LETTERS[1:12])

Created on 2018-11-27 by the reprex package (v0.2.1)