Have scale type correspond to stat output (continuous), not input (discrete)

66 views Asked by At

Motivation

I have lots of data in the form of (binary) survey responses, where each response is assigned a weight. I'd like to easily explore the data by plotting (weighted) proportions, with (weighted) margins of error.

The stats are not the main issue; the ggplot2 magic I'm attempting is. Below is an example that almost works™ except that I have to wrap a tiny detail inside as.numeric(). Which is icky. And there must be a fun and better way. Please show the way!

Toy Data

For the purpose of this question, the weights are irrelevant, but the plumbing has to work, so here are some "weighted" coin tosses.

toss_data <- data.frame(
  coin_type = c("penny", "nickel"),
  outcome = as.logical(c(0, 0, 1, 1, 0, 1, 1, 1, 0, 0)),
  weight = runif(10)) # not really relevant to this debugging issue

Problem

The code that's needed to set things up is in the next section. Briefly, the problem I seem to run into is this:

  • My "input" (outcome) starts life as a discrete variable.
  • If I force it to become numeric at the beginning, everything works as expected.
  • Otherwise, ggplot2 wants to set things up with a discrete scale for y—which totally makes sense, except that my stat (stat_wtd_mean(), below) yields a continuous y.
  • I generally want to follow up with a ... + scale_y_continuous(limits = c(0, 1)) but that of course will then fail with Error: Discrete value supplied to continuous scale.

The chunk below creates the desired output. Note the use of as.numeric():

p <- ggplot(toss_data)
p <- p + aes(x = coin_type, y = as.numeric(outcome), w = weight) 
p + stat_wtd_mean()

Desired output

But here's what happens when that as.numeric() is omitted, and outcome tries to make the party all about its discrete self:

p <- ggplot(toss_data)
p <- p + aes(x = coin_type, y = outcome, w = weight) 
p + stat_wtd_mean()

enter image description here

How do I avoid having to wrap outcome inside as.numeric()?

Can I override the default scale in some way? I've looked for ggproto clues but come up empty.

Should I (re)assign or modify aesthetic mappings or scales in some way?

I've never tried to implement my own geom or stat before, but it seems like a useful tool to have under one's belt.

I tried changing stat_wtd_mean() to return list(layer(...), scale_y_continuous()) instead of just layer(...), but that breaks with Error: Discrete value supplied to continuous scale.

Note: I want to do this by using a stat, instead of by writing a wrapper function like chart_wtd_mean(data, ...) that munges the data inside itself. The reason is that I want to exploit other aspects of ggplot2, like faceting or grouping-and-coloring, in a maximally flexible way.

Thanks for any insights!

Reprex

In real life I have a more complex compute_group_data() that yields y, ymin, and ymax. This example is stripped down, but enough to reproduce this issue:

library(tidyverse)

# given y and w, compute some (weighted) summary of y
compute_group_data <- function(data, scales) {
  grp_vars <- c(setdiff(names(scales), "y"), "PANEL", "group")
  grouped <- group_by(data, across(all_of(grp_vars)))
  df <- summarise(grouped, y = weighted.mean(y, w), .groups = "drop")
  return(df)
}

StatWtdMean <- ggproto(
  "StatWtdMean", Stat,
  compute_group = compute_group_data,
  required_aes = c("y", "w"),
  dropped_aes = c("w"))

stat_wtd_mean <- function (
    mapping = NULL,
    data = NULL,
    geom = "point",
    position = "identity",
    na.rm = FALSE,
    show.legend = NA,
    inherit.aes = TRUE,
    ...
) {
  layer(
    stat = StatWtdMean, data = data, mapping = mapping, geom = geom,
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...))
}

toss_data <- data.frame(
  coin_type = c("penny", "nickel"),
  outcome = as.logical(c(0, 0, 1, 1, 0, 1, 1, 1, 0, 0)),
  weight = runif(10)) # not really relevant to this debugging issue

# if as.numeric() is omitted, this breaks (discrete instead of cont scale)
p <- ggplot(toss_data)
p <- p + aes(x = coin_type, y = as.numeric(outcome), w = weight) 
p + stat_wtd_mean() + scale_y_continuous(limits = c(0, 1))
0

There are 0 answers