Stacked and grouped bar chart in ggplot while maintaining y scale

Question

Stacked and grouped bar chart in ggplot while maintaining y scale

59 views Asked by Emilia Miller At 09 February 2024 at 17:18

I am working on a project where I am analysing phytoplankton data using R and trying to assess the pattern of how pigment ratios change over the year.

The code for one of my plots is given below. It is working well (see ![Valid jpg] plot by sample ID) in that is is plotting the pigments as ratios of 100%, which is what I want, since we are looking at relative pigment ratios.

However, there is no clear time variable (I want the month to be apparent). If I change the x aes to month, the stacked plots sum to different values because there are different numbers of datapoints for each month (see plot by month).

I have tried using facet_wrap(~month) but this makes the data unclear in my opinion. Ideally, I would have the bars still summing to 100% as in seq, but with some clear separation by month too, so that I could make something like the photo pigments with depth (which I made using a different approach), but with separation by month.

I hope that this is clear - it is quite a complicated problem and this is my first time asking a question, so I hope it wasn't too waffly!

plot_data <- shallowest_sample %>%select(all_of(selected_columns)) %>%pivot_longer(cols = -c(Id, lat, lon, depth, Unique_ID,decy,seq,month), names_to = "Pigment", values_to = "Percentage") %>%mutate(Pigment = sub("^normalised_", "", Pigment)) %>%group_by(month)

pigment_ratios_shallow <- ggplot(plot_data, aes(fill = factor(Pigment), y = Percentage, x = seq)) 
    + geom_bar(position = 'stack', stat = 'identity') 
    + labs(title = "Normalized Accessory Pigment Distribution for surface waters",x = "Sample ID",y = "Percentage") 
    + scale_y_continuous(labels = scales::percent_format(scale = 100))
    + theme_minimal()
pigment_ratios_shallow

As suggested in the comments, here is a sample of the data I wish to plot: structure(list(Id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), lat = c(31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667), lon = c(-64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105), depth = c(3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9), Unique_ID = c("200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416"), decy = c(2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738), Pigment = c("peridin", "19.but", "fuco", "19.hex", "prasino", "diadino", "allo", "abcarotene", "lutein", "zea", "peridin", "19.but", "fuco", "19.hex", "prasino", "diadino", "allo", "abcarotene", "lutein", "zea"), Percentage = c(0.015625, 0.09375, 0.078125, 0.140625, 0.03125, 0.0625, 0.015625, 0.046875, 0.015625, 0.5, 0.0151515151515152, 0.106060606060606, 0.0757575757575758, 0.136363636363636, 0.0303030303030303, 0.0606060606060606, 0.0151515151515152, 0.0454545454545455, 0.0151515151515152, 0.5)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))

Original Q&A

There are 1 answers

**VinceGreg** · Answer 1 · 2024-02-09T18:36:04+00:00

Third attempt; I am new in answering here. position_fill() seems to be usefull. I am not sure what you want on the X-axis, I used the month-id interaction. Maybe by playing with the space between the bars, you could increase the readability.

library(tidyverse)
# Making a dataframe
df = tibble(  id = rep(1:30), # 30 ID
              month = c( rep(1, 15), rep(2, 10),rep(3, 5)), # Unequal month numbers
              comp1= 0.1 ,  # Fist component: constant
              comp2 = rnorm(30, mean = 0.5, sd = 0.1)# Second component: random
) %>% 
  mutate(comp3 = 1- comp1 - comp2) # Third component: 1 - others


df %>% pivot_longer(comp1:comp3) %>% 
  group_by(id, name) %>% 
  ggplot(aes(x= interaction(month %>% as.factor(), id %>% as.factor), fill =name,y =value,
             group=id)) +
  geom_bar(stat="identity", col="black", position = 
             position_fill() )

Second answer: You could normalize by the number of sample per month.

df = tibble(  id = rep(1:30), # 30 ID
              month = c( rep(1, 15), rep(2, 10),rep(3, 5)), # Unequal month numbers
              comp1= 0.1 ,  # Fist component: constant
              comp2 = rnorm(30, mean = 0.5, sd = 0.1)# Second component: random
              ) %>% 
  mutate(comp3 = 1- comp1 - comp2) # Third component: 1 - others


df %>% pivot_longer(comp1:comp3) %>% 
  group_by( month,name) %>% 
  mutate( n = n()) %>% 
  mutate(standardiez_value= value/n) %>% 
  ggplot(aes(x=month %>% as.factor(), fill =name, y= standardiez_value,
             group=name)) +
  geom_bar(stat="identity", col="black")

Versus my old answer (below), we can the varibility by component. However, the unequality of the dataset add a difficulty to read the graph.

I still think that I don't represent the data the right way, but it's difficult to understand what you really want.

First answer:

If I understand correctly, you want to represent the average of each component for each month. My suggestion is to compute the monthly average for each month. I made an unequal dataset:

library(tidyverse)
df = tibble(  id = rep(1:30), # 30 ID
              month = c( rep(1, 15), rep(2, 10),rep(3, 5)), # Unequal month numbers
              comp1= 0.1 ,  # Fist component: constant
              comp2 = rnorm(30, mean = 0.5, sd = 0.2)# Second component: random
              ) %>% 
  mutate(comp3 = 1- comp1 - comp2) # Third component: 1 - others

Then I pivot_longer(), group_by() and summarise(), before ploting.

df %>% pivot_longer(comp1:comp3) %>% 
  group_by( month,name) %>%  summarise(mean = mean(value)) %>% 
  ggplot(aes(x=month %>% as.factor(), fill =name, y= mean)) +
  geom_bar(stat="identity")

TechQA.

Stacked and grouped bar chart in ggplot while maintaining y scale

There are 1 answers

Related Questions in R

Related Questions in GGPLOT2

Related Questions in GEOM-BAR

Related Questions in STACKED-BAR-CHART

Popular Questions

Trending Questions