ggplot extension function to plot a superimposed mean in a scatterplot

352 views Asked by At

I am trying to create a custom function that extends ggplot2. The goal of the function is to superimpose a mean with horizontal and vertical standard errors. The code below does the entire thing.

library(plyr)
library(tidyverse)

summ <- ddply(mtcars,.(),summarise,
              dratSE = sqrt(var(drat))/length(drat),
              mpgSE = sqrt(var(mpg))/length(mpg),
              drat = mean(drat),
              mpg = mean(mpg))

ggplot(data = mtcars, mapping = aes(x = drat, y = mpg)) +
  geom_point(shape = 21, fill = 'black', color = 'white', size = 3) + 
  geom_errorbarh(data = summ, aes(xmin = drat - dratSE, xmax = drat + dratSE)) +
  geom_errorbar(data = summ, aes(ymin = mpg - mpgSE, ymax = mpg+mpgSE), width = .1) +
  geom_point(data = summ, color='red',size=4) 

Ideally, it would only take a function such as geom_scattermeans() to do this whole thing. But I am not sure how the aesthetics get transferred into subsequent geom functions from ggplot().

Also I've had difficulties in making a function that receives column names as argument and making it work with ddply(). enter image description here

2

There are 2 answers

3
Justin Landis On BEST ANSWER

Since my first answer is still the easier solution, I decieded to keep it. This answer should get OP closer to their goal.

Building a ggproto object can be cumbersome depending on what you are trying to do. In your case you are combining 3 ggproto Geoms classes together with the possibility of a new Stat.

The three Geoms are:

  • GeomErrorbar
  • GeomErrorbarh
  • GeomPoint

To get started, sometimes you just need to inherit from one of the classes and overwrite the method, but to pool the three together you will need to do more work.

Lets first consider how each of these Geoms draw their grid objects. Depending on the Geom it is in one of these functions draw_layer(), draw_panel(), and draw_group(). Fortunately, each of the geoms we want to use only use draw_panel() which mean a bit less work for us - we will just call these methods directly and build a new grobTree object. We will just need to be careful that all the correct parameters are making it to our new Geom's draw_panel() method.

Before we get to writing our own draw_panel, we have to first consider setup_params() and setup_data() functions. Occasionally, these will modify the data right out the gate. These steps are usually helpful to have automatic processing here and are often used to standardize/transform the data. A good example is GeomTile and GeomRect, they are essentially the same Geoms but their setup_data() functions differ because they are parameterized differently.

Lets assume you only want to assign an x and a y aesthetic, and leave the calculations of xmin, ymin, xmax, and ymax to the geoms/stats.

Fortunately, GeomPoint just returns the data with no modifications, so we will need to incorporate GeomErrorbar and GeomErrorbarh's setup_data() first. To skip some steps, I am just going to make a new Stat that will take care of transforming those values for us within a compute_group() method.

A note here, GeomErrorbar and GeomErrorbarh allow for another parameter to be included - width and height respectively, which controls how wide the flat portions of the error bars are.

also, within these functions, each will make their own xmin, xmax, ymin, ymax - so we will need to distinguish these parameters.

First load required information into the namespace

library(ggplot2)
library(grid)
"%||%" <- ggplot2:::`%||%`

Start with new Stat, I've decided to call it a PointError

StatPointError <- ggproto(
  "StatPointError",
  Stat,
  #having `width` and `height` as named parameters here insure
  #that they will be available to the `Stat` ggproto object.
  compute_group = function(data, scales, width = NULL, height = NULL){
    data$width <- data$width %||% width %||% (resolution(data$x, FALSE)*0.9)
    data$height <- data$height %||% height %||% (resolution(data$y, FALSE)*0.9)
    
    data <- transform(
      data,
      x = mean(x),
      y = mean(y),
      # positions for flat parts of vertical error bars
      xmin = mean(x) - width /2,
      xmax = mean(x) + width / 2,
      width = NULL, 
      # y positions of vertical error bars
      ymin = mean(y) - sqrt(var(y))/length(y),
      ymax = mean(y) + sqrt(var(y))/length(y),
      #positions for flat parts of horizontal error bars
      ymin_h = mean(y) - height /2,
      ymax_h = mean(y) + height /2,
      height = NULL,
      # x positions of horizontal error bars
      xmin_h = mean(x) - sqrt(var(x))/length(x),
      xmax_h = mean(x) + sqrt(var(x))/length(x)
    )
    unique(data)
  }
)

Now for the fun part, the Geom, again I'm going for PointError as a consistent name.

GeomPointError <- ggproto(
  "GeomPointError",
  GeomPoint,
  #include some additional defaults
  default_aes = aes(
    shape = 19,
    colour = "black",
    size = 1.5, # error bars have defaults of 0.5 - you may want to add another parameter?
    fill = NA,
    alpha = NA,
    linetype = 1,
    stroke = 0.5, # for GeomPoint
    width = 0.5,  # for GeomErrorbar
    height = 0.5, # for GeomErrorbarh
  ),
  draw_panel = function(data, panel_params, coord, width = NULL, height = NULL, na.rm = FALSE) {
    #make errorbar grobs
    data_errbar <- data
    data_errbar[["size"]] <- 0.5
    errorbar_grob <- GeomErrorbar$draw_panel(data = data_errbar,
                                             panel_params = panel_params, coord = coord, 
                                             width = width, flipped_aes = FALSE)
    #re-parameterize errbarh data
    data_errbarh <- transform(data, 
                              xmin = xmin_h, xmax = xmax_h, ymin = ymin_h, ymax = ymax_h,
                              xmin_h = NULL, xmax_h = NULL, ymin_h = NULL, ymax_h = NULL,
                              size = 0.5)
    #make errorbarh grobs
    errorbarh_grob <- GeomErrorbarh$draw_panel(data = data_errbarh,
                                               panel_params = panel_params, coord = coord,
                                               height = height)
    point_grob <- GeomPoint$draw_panel(data = data, panel_params = panel_params,
                                       coord = coord, na.rm = na.rm)
    gt <- grobTree(
      errorbar_grob,
      errorbarh_grob,
      point_grob, name = 'geom_point_error')
    gt
  }
)

Last, we need a function for the user to call that will make a Layer object.

geom_point_error <- function(mapping = NULL, data = NULL,
                             position = "identity",
                             ...,
                             na.rm = FALSE,
                             show.legend = NA,
                             inherit.aes = TRUE) {
  layer(
    data = data,
    mapping = mapping,
    stat = StatPointError,
    geom = GeomPointError,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      na.rm = na.rm,
      ...
    )
  )
}

Now we can test if this is working properly

ggplot(data = mtcars, mapping = aes(x = drat, y = mpg)) +
  geom_point(shape = 21, fill = 'black', color = 'white', size = 3) +
  geom_point_error(color = "red", width = .1, height = .3)

ggplot(data = mtcars, mapping = aes(x = drat, y = mpg)) +
  geom_point(shape = 21, fill = 'black', color = 'white', size = 3) +
  geom_point_error(aes(color = hp>100))

Created on 2021-05-18 by the reprex package (v1.0.0)

There is obviously so much more you could do with this, from including additional default aesthetics such that you could control the color and size of the lines/points separately (may want to override GeomPointError$setup_data() to insure everything maps correctly).

Finially, this geom is pretty naive in that it assumes the x and y data mappings are continuous. It still works with mixing continuous and discrete, but looks a bit funky

ggplot(mpg, aes(cty, model)) +
    geom_point() +
    geom_point_error(color = 'red')

1
Justin Landis On

I think plyr is pretty defunct at this point. I would recommend the dplyr package. When programming with dplyr you can use {{ (curly-curly, or embracing as the documentation says) to properly quote expressions.

library(ggplot2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

geom_point_error <- function(data, x, y, color = 'red', size = 4) {
  
  data <- dplyr::summarise(
    data,
    x_se = sqrt(var({{x}}))/length({{x}}),
    y_se = sqrt(var({{y}}))/length({{y}}),
    x = mean({{x}}),
    y = mean({{y}})
  )
  
  list(
    geom_errorbarh(data = data,
                   mapping = aes(y = y,
                                 xmin = x - x_se, xmax = x + x_se), inherit.aes = F),
    geom_errorbar(data = data,
                  mapping = aes(x = x,
                                ymin = y - y_se, ymax = y + y_se), width = .1,inherit.aes = F),
    geom_point(data = data,
               mapping = aes(x = x, y = y),
               color = color, size = size)
  )
}

ggplot(data = mtcars, mapping = aes(x = drat, y = mpg)) +
  geom_point(shape = 21, fill = 'black', color = 'white', size = 3) + 
  geom_point_error(mtcars, x = drat, y = mpg)

Created on 2021-05-17 by the reprex package (v1.0.0)

The second option would be to build your own ggproto Geom to handle these calculations inside ggplot2, but that is a bit much for right now.