Motivation
I have lots of data in the form of (binary) survey responses, where each response is assigned a weight. I'd like to easily explore the data by plotting (weighted) proportions, with (weighted) margins of error.
The stats are not the main issue; the ggplot2
magic I'm attempting is. Below is an example that almost works™ except that I have to wrap a tiny detail inside as.numeric()
. Which is icky. And there must be a fun and better way. Please show the way!
Toy Data
For the purpose of this question, the weights are irrelevant, but the plumbing has to work, so here are some "weighted" coin tosses.
toss_data <- data.frame(
coin_type = c("penny", "nickel"),
outcome = as.logical(c(0, 0, 1, 1, 0, 1, 1, 1, 0, 0)),
weight = runif(10)) # not really relevant to this debugging issue
Problem
The code that's needed to set things up is in the next section. Briefly, the problem I seem to run into is this:
- My "input" (
outcome
) starts life as a discrete variable. - If I force it to become numeric at the beginning, everything works as expected.
- Otherwise,
ggplot2
wants to set things up with a discrete scale fory
—which totally makes sense, except that my stat (stat_wtd_mean()
, below) yields a continuousy
. - I generally want to follow up with a
... + scale_y_continuous(limits = c(0, 1))
but that of course will then fail withError: Discrete value supplied to continuous scale
.
The chunk below creates the desired output. Note the use of as.numeric()
:
p <- ggplot(toss_data)
p <- p + aes(x = coin_type, y = as.numeric(outcome), w = weight)
p + stat_wtd_mean()
But here's what happens when that as.numeric()
is omitted, and outcome
tries to make the party all about its discrete self:
p <- ggplot(toss_data)
p <- p + aes(x = coin_type, y = outcome, w = weight)
p + stat_wtd_mean()
How do I avoid having to wrap outcome
inside as.numeric()
?
Can I override the default scale in some way? I've looked for ggproto
clues but come up empty.
Should I (re)assign or modify aesthetic mappings or scales in some way?
I've never tried to implement my own geom
or stat
before, but it seems like a useful tool to have under one's belt.
I tried changing stat_wtd_mean()
to return list(layer(...), scale_y_continuous())
instead of just layer(...)
, but that breaks with Error: Discrete value supplied to continuous scale
.
Note: I want to do this by using a stat, instead of by writing a wrapper function like chart_wtd_mean(data, ...)
that munges the data inside itself. The reason is that I want to exploit other aspects of ggplot2
, like faceting or grouping-and-coloring, in a maximally flexible way.
Thanks for any insights!
Reprex
In real life I have a more complex compute_group_data()
that yields y
, ymin
, and ymax
. This example is stripped down, but enough to reproduce this issue:
library(tidyverse)
# given y and w, compute some (weighted) summary of y
compute_group_data <- function(data, scales) {
grp_vars <- c(setdiff(names(scales), "y"), "PANEL", "group")
grouped <- group_by(data, across(all_of(grp_vars)))
df <- summarise(grouped, y = weighted.mean(y, w), .groups = "drop")
return(df)
}
StatWtdMean <- ggproto(
"StatWtdMean", Stat,
compute_group = compute_group_data,
required_aes = c("y", "w"),
dropped_aes = c("w"))
stat_wtd_mean <- function (
mapping = NULL,
data = NULL,
geom = "point",
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE,
...
) {
layer(
stat = StatWtdMean, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...))
}
toss_data <- data.frame(
coin_type = c("penny", "nickel"),
outcome = as.logical(c(0, 0, 1, 1, 0, 1, 1, 1, 0, 0)),
weight = runif(10)) # not really relevant to this debugging issue
# if as.numeric() is omitted, this breaks (discrete instead of cont scale)
p <- ggplot(toss_data)
p <- p + aes(x = coin_type, y = as.numeric(outcome), w = weight)
p + stat_wtd_mean() + scale_y_continuous(limits = c(0, 1))