Custom x-axis labels not applying correct custom x-axis colours

63 views Asked by At

In my df, I need the two x-axis labels under "subset" = "Initial" to be coloured grey50 (like the two bars on the left) and the two x-axis labels under "subset" = "Processed" to be coloured midnightblue (like the two bars on the right).

Code:

df_TP <- data.frame(
  categ = c("initial_t_tp", "processed_t_tp", "initial_v_tp", "processed_v_tp"),
  group = rep("TP", 4),
  absolute = c(86, 85, 21, 21),
  percentage = c(84.16, 84.16, 84.00, 84.00),
  col = c(0, 1, 0, 1),
  subset = c("initial", "processed", "initial", "processed")
)

labels_tp <- c(
  "initial_t_tp" = "Training",
  "initial_v_tp" = "Validation",
  "processed_v_tp" = "Validation",
  "processed_t_tp" = "Training"
)

a <- ifelse(df_TP$col == 0, "grey50", "midnightblue")

# Create the bar plot with separation
ggplot(
  df_TP,
  aes(x = categ, y = absolute, fill = subset)
) +
geom_col(aes(fill = subset)) +
labs(x = NULL, y = "# detected events") +
scale_fill_manual(
  values = c("initial" = "grey50", "processed" = "midnightblue")
) +
theme_minimal() +
theme(
  axis.title.y = element_text(
    size = 30, color = "grey50",
    vjust = 2, hjust = 0.95),
  axis.text.y = element_text(size = 25, colour = "grey50"),
  axis.text.x = ggtext::element_markdown(
    size = 25, vjust = 3, colour = a
  ),
  strip.text = element_blank(),
  panel.grid.major.y = element_line(size = 0.5, color = "grey85"),
  panel.grid.major.x = element_blank(),
  panel.grid.minor = element_blank(),
  legend.position = "none",
  plot.margin = margin(l = 20, 0, 0, 0),
  aspect.ratio = 1/0.9
) +
coord_cartesian(
  ylim = c(0, 90), clip = 'off'
) +
facet_grid(
  . ~ subset, scales = "free_x", switch = "x",
) +
scale_y_continuous(
  limits = c(0, 90), breaks = seq(0, 90, by = 30)
) +
scale_x_discrete(labels = labels_tp)

This outputs:

enter image description here

I'm unable to understand why only Validation is being coloured (both Validation from Initial and Validation from Processed). As seen in my dataframe, the only way it can separate between Training and Validation is by considering a threshold on the y-axis values (Validation is always lower than Training).

I have tried applying the solution here for both a categorical and a numerical condition (which is why the sole reason I created "col" with 0 and 1). I tried the ifelse with "col" and "subset", with no luck.

I have also tried the convoluted solution by Ben in this post:

cols <- c(
  "initial_t_tp" = "grey50",
  "initial_v_tp" = "grey50",
  "processed_v_tp" = "midnightblue",
  "processed_t_tp" = "midnightblue"
)

colour = cols[as.character(df_TP$categ[order(df_TP$categ)])]

But no luck.

I understand this post is prone to downvotes because this question has been asked before and have spent the last two hours trying to avoid an extra post. At this point, I'm making mistakes out of tiredness.

I am assuming my code has a little more elements than other questions made here and something is making the code override the dictated logic in favour of either only training or only validation. But what?

1

There are 1 answers

1
r2evans On BEST ANSWER

Using external vectors (a here) in non-NSE elements of ggplot2 expressions can be problematic, since the order of how a is applied is not necessarily (often is not at all) the same as the order of the columns. I suggest putting the colors into the frame itself.

I'm inferring that you want "Training" before "Validation", so we'll need to control the order using factor as well.

By "baking" (my word) the axis label into the data itself, we can (a) include its color, (b) control its order, and (c) remove the need to change the labels with scale_x_discrete.

(No longer using a or labels_tp.)

Here's the "baked in data":

library(dplyr)
df_TP |>
  mutate(
    xcol = if_else(col == 0, "grey50", "midnightblue"),
    xaxs = if_else(grepl("_t_", categ), "Training", "Validation"),
    xaxs = reorder(sprintf("<span style = 'color:%s;'>%s</span>", xcol, xaxs),
                   match(xaxs, c("Training", "Validation")))
  )
#            categ group absolute percentage col    subset         xcol                                                  xaxs
# 1   initial_t_tp    TP       86      84.16   0   initial       grey50         <span style = 'color:grey50;'>Training</span>
# 2 processed_t_tp    TP       85      84.16   1 processed midnightblue   <span style = 'color:midnightblue;'>Training</span>
# 3   initial_v_tp    TP       21      84.00   0   initial       grey50       <span style = 'color:grey50;'>Validation</span>
# 4 processed_v_tp    TP       21      84.00   1 processed midnightblue <span style = 'color:midnightblue;'>Validation</span>

(I'm using dplyr here, though this can be easily adapted to base R in three steps: (1) change mutate to transform, (2) change if_else to ifelse, and (3) break the second xaxs assignment into a new |> transform(..), since transform does not "see" the previous definition of xaxs.)

With this, we can remove the dependence on a and labels_tp, and instead tell ggplot and ggtext to format the axis labels directly.

# Create the bar plot with separation
df_TP |>
  mutate(
    xcol = if_else(col == 0, "grey50", "midnightblue"),
    xaxs = if_else(grepl("_t_", categ), "Training", "Validation"),
    xaxs = reorder(sprintf("<span style = 'color:%s;'>%s</span>", xcol, xaxs),
                   match(xaxs, c("Training", "Validation")))
  ) |>
  ggplot(
    # df_TP,
    aes(x = xaxs, y = absolute, fill = subset)
  ) +
  geom_col(aes(fill = subset)) +
  labs(x = NULL, y = "# detected events") +
  scale_fill_manual(
    values = c("initial" = "grey50", "processed" = "midnightblue")
  ) +
  theme_minimal() +
  theme(
    axis.title.y = element_text(
      size = 30, color = "grey50",
      vjust = 2, hjust = 0.95),
    axis.text.y = element_text(size = 25, colour = "grey50"),
    axis.text.x = ggtext::element_markdown(
      size = 25, vjust = 3 #, colour = a
    ),
    strip.text = element_blank(),
    panel.grid.major.y = element_line(size = 0.5, color = "grey85"),
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank(),
    legend.position = "none",
    plot.margin = margin(l = 20, 0, 0, 0),
    aspect.ratio = 1/0.9
  ) +
  coord_cartesian(
    ylim = c(0, 90), clip = 'off'
  ) +
  facet_grid(
    . ~ subset, scales = "free_x", switch = "x",
    ) +
  scale_y_continuous(
    limits = c(0, 90), breaks = seq(0, 90, by = 30)
  ) #  +
  # scale_x_discrete(labels = labels_tp)

grob with corrected labels