I am working on a project where I am analysing phytoplankton data using R and trying to assess the pattern of how pigment ratios change over the year.
The code for one of my plots is given below. It is working well (see ![Valid jpg] plot by sample ID) in that is is plotting the pigments as ratios of 100%, which is what I want, since we are looking at relative pigment ratios.
However, there is no clear time variable (I want the month to be apparent). If I change the x aes to month, the stacked plots sum to different values because there are different numbers of datapoints for each month (see plot by month).
I have tried using facet_wrap(~month) but this makes the data unclear in my opinion. Ideally, I would have the bars still summing to 100% as in seq, but with some clear separation by month too, so that I could make something like the photo
pigments with depth (which I made using a different approach), but with separation by month.
I hope that this is clear - it is quite a complicated problem and this is my first time asking a question, so I hope it wasn't too waffly!
plot_data <- shallowest_sample %>%select(all_of(selected_columns)) %>%pivot_longer(cols = -c(Id, lat, lon, depth, Unique_ID,decy,seq,month), names_to = "Pigment", values_to = "Percentage") %>%mutate(Pigment = sub("^normalised_", "", Pigment)) %>%group_by(month)
pigment_ratios_shallow <- ggplot(plot_data, aes(fill = factor(Pigment), y = Percentage, x = seq))
+ geom_bar(position = 'stack', stat = 'identity')
+ labs(title = "Normalized Accessory Pigment Distribution for surface waters",x = "Sample ID",y = "Percentage")
+ scale_y_continuous(labels = scales::percent_format(scale = 100))
+ theme_minimal()
pigment_ratios_shallow
As suggested in the comments, here is a sample of the data I wish to plot: structure(list(Id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), lat = c(31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667, 31.667), lon = c(-64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105, -64.105), depth = c(3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9), Unique_ID = c("200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416", "200211131416"), decy = c(2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738, 2002.86738), Pigment = c("peridin", "19.but", "fuco", "19.hex", "prasino", "diadino", "allo", "abcarotene", "lutein", "zea", "peridin", "19.but", "fuco", "19.hex", "prasino", "diadino", "allo", "abcarotene", "lutein", "zea"), Percentage = c(0.015625, 0.09375, 0.078125, 0.140625, 0.03125, 0.0625, 0.015625, 0.046875, 0.015625, 0.5, 0.0151515151515152, 0.106060606060606, 0.0757575757575758, 0.136363636363636, 0.0303030303030303, 0.0606060606060606, 0.0151515151515152, 0.0454545454545455, 0.0151515151515152, 0.5)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))
Third attempt; I am new in answering here.
position_fill()seems to be usefull. I am not sure what you want on the X-axis, I used the month-id interaction. Maybe by playing with the space between the bars, you could increase the readability.Second answer: You could normalize by the number of sample per month.
Versus my old answer (below), we can the varibility by component. However, the unequality of the dataset add a difficulty to read the graph.
I still think that I don't represent the data the right way, but it's difficult to understand what you really want.
First answer:
If I understand correctly, you want to represent the average of each component for each month. My suggestion is to compute the monthly average for each month. I made an unequal dataset:
Then I
pivot_longer(),group_by()andsummarise(), before ploting.