Problem with naming x-axis with ggplot2 in Rstudio

430 views Asked by At

I'm trying to create some variation of a pareto-chart. Moving along the code I face a problem I cannot solve on my own for several hours. It's regarding the data order of the package ggplot2 (1) and renaming the labels accordingly(2).

(1)Since I want to create an ordered bar-plot with a saturation curve, I created a dummyvar from X to X-1, so my bars are sorted from high to low, as you can see in the output (1). By maneuvering around this problem I created a second problem I can't fix.

(2)I have a column in my df containing all the species I want to see at the x-axis. However, ggplot won't allow to print those accordingly. Actually since I added the command I won't get any labeling on the x-axis. Somehow I will not get any error.

So my question is: Is there a way to use my species list as x-axis?(But remember my data has to be sorted from high to low) Or does some one easily spot a way to solve the labeling problem?

cheers

dfb
     Beech id     proc     kommu    Order
1   Va fla  1 8.749851  8.749851  Psocopt
2       Er  2 7.793812 16.543663    Acari
3 Faga dou  3 7.659406 24.203069     Dipt
4      Tro  4 6.675941 30.879010    Acari
5  Hal ann  5 6.289307 37.168317     Dipt
6    Stigm  6 3.724406 40.892723    Acari
7   Di fag  7 3.642574 44.535297 Lepidopt
8    Phyfa  8 3.390545 47.925842 Neoptera
9   Phylma  9 2.766040 50.691881 Lepidopt

data example:

structure(list(Beech = c("Va fla", "Er", "Faga dou", "Tro", "Hal ann", 
"Stigm", "Di fag", "Phyfa", "Phylma"), id = c(1, 2, 3, 4, 5, 
6, 7, 8, 9), proc = c(8.749851, 7.793812, 7.659406, 6.675941, 
6.289307, 3.724406, 3.642574, 3.390545, 2.76604), kommu = c(8.749851, 
16.543663, 24.203069, 30.87901, 37.168317, 40.892723, 44.535297, 
47.925842, 50.691881), Order = c("Psocopt", "Acari", "Dipt", 
"Acari", "Dipt", "Acari", "Lepidopt", "Neoptera", "Lepidopt")), row.names = c(NA, 
-9L), class = c("tbl_df", "tbl", "data.frame"))
library(openxlsx)
library(ggplot2)

dfb <- data.xlsx ###(df containing different % values per species)
labelb <- dfb$Beech ###(list of 22 items; same number as x-values)

p <-ggplot(dfb, aes(x=id))
p <- p + geom_bar(aes(y = proc), stat = "identity", fill = "lightgreen")
p <- p + geom_line(aes(y = kommu/10), color = "orange", size = 2) + geom_point(aes(y = kommu/10),size = 2)
p <- p + scale_y_continuous(sec.axis = sec_axis(~.*10, name ="Total biocoenosis[%]"))
p <- p + labs(y = "Species [%]",
              x = "Species")
p <- p + scale_x_discrete(labels = labelb)
p <- p + theme(legend.position = c(0.8, 0.9))

enter image description here

--> Answer to other comments: So basically my problem is the bars are not labeled with a species name. I know that this is a result due to my dummyvar, which is basically 1 to 22. So I try to force ggplot to name the x-axis with my wanted values. But this input doesn't work

p <- p + scale_x_discrete(labels = labelb)

But back to your suggestions: Jeah, I tried tidyverse just after creating this post and couldn't handle it good enough. But your idea doesn't do anything for me, its like using the ggplot command.

  arrange(Beech) %>%
  mutate(Beech = factor(Beech, levels = unique(.$Beech))) %>%
  ggplot(aes(Beech, proc)) +
  geom_col()

unordered bar char with dplyr

2

There are 2 answers

1
Walker Harrison On

I can't quite tell from the picture what's going wrong, but one way to make sure your bar plots are in ascending/descending order is to arrange the column and then convert it to a factor using the existing order of the categories:

So, without ordering:

library(tidyverse)

diamonds %>%
  group_by(cut) %>%
  summarize(price = mean(price)) %>%
  ggplot(aes(cut, price)) +
  geom_bar(stat = "identity")

enter image description here

And with ordering:

diamonds %>%
  group_by(cut) %>%
  summarize(price = mean(price)) %>%
  arrange(price) %>%
  mutate(cut = factor(cut, levels = unique(.$cut))) %>%
  ggplot(aes(cut, price)) +
  geom_bar(stat = "identity")

enter image description here

1
Rhesous On

I edited your code with the database sample you provided and I think I was able to do what you wanted.

Basically I sorted Beech depending on the descending proc and then convert it to factor. Here is the modified code and the result:

p <-
  dfb %>% 
  arrange(desc(proc)) %>%
  mutate(Beech = factor(Beech, levels = unique(.$Beech))) %>%
  ggplot(aes(Beech)) +
  geom_bar(aes(y = proc), stat = "identity", fill = "lightgreen") +
  geom_line(aes(y = kommu/10, x=as.integer(Beech)), color = "orange", size = 2) + 
  geom_point(aes(y = kommu/10),size = 2) +
  labs(y = "Species [%]", x = "Species") +
  scale_x_discrete("Species") +
  scale_y_continuous(sec.axis = sec_axis(~.*10, name ="Total biocoenosis[%]")) + 
  theme(legend.position = c(0.8, 0.9))
p

enter image description here

Note: I had to tweak a bit the geom_line by adding x=as.integer(Beech) because it works with numbers and not factors.