add geom_hline legend without screwing up geom_line legend

1.7k views Asked by At

I'm trying to draw a simple (scree)-plot with some extra geom_hline and geom_vlines thrown in.

Problem is: whenever I so much as add show_guide=TRUE or add some aes() to the geom_xline, I screw up the original legend.

Here's some (ugly) fake data:

exdf <- data.frame(rep(x=1:12, times = 3), rep(x = c("A", "B", "C"), times = 6), rnorm(36), stringsAsFactors = FALSE)
colnames(exdf) <- c("PC", "variable", "eigenvalue") 

And here's my plot:

g <- ggplot(data = exdf, mapping = aes(x = factor(PC), y = eigenvalue))
g <- g + geom_line(mapping = aes(group = factor(variable), linetype = variable))
g <- g + geom_vline(xintercept = 7, colour = "green", show_guide = TRUE)

enter image description here enter image description here

How do I add a separate legend for the geom_vline without polluting the other legend?

Can't wrap my head around why one layer's color would change that of another legend.

2

There are 2 answers

1
Nick Kennedy On

This partly solves the problem:

g <- ggplot(data = exdf, mapping = aes(x = factor(PC), y = eigenvalue))
g <- g + geom_line(mapping = aes(group = factor(variable), linetype = variable))
g <- g + geom_vline(aes(xintercept = x, colour = Threshold), data.frame(x = 7, Threshold = "A"), show_guide = TRUE) + scale_colour_manual(values = c(A = "green")

enter image description here

But the legend will still have crosses for the variable section, albeit not green ones.

Alternatively you could use a geom_line with a new data frame with two rows, both with the same x and y equal to the lower and upper bounds of your data. This will give you a legend that has a horizontal green line for your threshold and no vertical lines.

0
maxheld On

Based on @Nick K's suggestion in the above, here's a way to do this with clean legends via different data = for the different layers.

    exdf <- data.frame(rep(x=1:12, times = 3), rep(x = c("A", "B", "C"), times = 6), rnorm(36), stringsAsFactors = FALSE)
colnames(exdf) <- c("PC", "variable", "eigenvalue") 
    g <- ggplot()
    g <- g + geom_line(data = exdf, mapping = aes(x = factor(PC), y = eigenvalue, group = factor(variable), linetype = variable))
    g
    thresholds <- data.frame(threshold = "Threshold-A", PC = 7, ymin = min(exdf$eigenvalue), ymax = max(exdf$eigenvalue))
    g <- g + geom_linerange(data = thresholds, mapping = aes(x = PC, ymin = ymin, ymax = ymax, color = threshold))
    g

yields:

clean-legend

Notice:

  • I know, the original data exdf are dumb and make an ugly plot; that's not the point here.
  • Notice that you have to set the data = argument for both layers, and keep the first g <- ggplot() blank, otherwise ggplot2 gets confused about the dataframes.
  • yeah, it's a hack job (see below), and it also doesn't fill the y-height of the plot, as a geom_vline should.

As an add-on, (not a solution!), here's how it should work with geom_vline:

exdf <- data.frame(rep(x=1:12, times = 3), rep(x = c("A", "B", "C"), times = 6), rnorm(36), stringsAsFactors = FALSE)
colnames(exdf) <- c("PC", "variable", "eigenvalue") 
g <- ggplot()
g <- g + geom_line(data = exdf, mapping = aes(x = factor(PC), y = eigenvalue, group = factor(variable), linetype = variable))
g
g + geom_vline(data = thresholds, mapping = aes(xintercept = PC, color = threshold), show_guide = TRUE)

yields:

messy legend via geom_vline

That fills the yheight, as you would expect from geom_vline, but somehow messes up the legend of variable (notice the vertical lines).

Not sure why this is so, feels like a bug to me. Here reported: https://github.com/hadley/ggplot2/issues/1267