I would like to add shaded standard error curves to geom_density
using ggplot2
. My code looks like this:
data.plot <- data.frame(x = c(rnorm(100, mean = 0, sd = 5),
rnorm(100, mean = 1, sd =2 )),
g = factor(c(rep(1, 100),
rep(2,100))))
ggplot(data.plot, aes(x, linetype = g)) + geom_density()
I couldn't find a tutorial or examples to do so. Thank you.
The best solution is with bootstrapping, as mentioned in the comments. I'll use the classic iris data, focusing on the density of Sepal.Length for each Species.
Generate samples
So I just made 1000 bootstrap replicates of the original data, taking the same number of rows out of each group as its original sample size, with replacement. Now I have to
unnest
to access the data in the nested data column.Compute within-sample density
So we expanded the data to its long form, then took the density of each group's Sepal.Length within Species within bs. We had to supply a manual
from =
andto =
since the min and max within each bootstrap might differ (and set a lowern =
than the default 512 just to save time). To simplify theS3: density
object that's generated, we usebroom::tidy
. This is the computationally intensive step, so we'll save this object as densities.within.Summarize densities into quantiles
This results in columns named x and y, but we'll rename these to match our data. Then we'll figure out: for the densities calculated at each computed possible Sepal.Length, what is the lower end of the CI, the median, and the upper end of the CI? We'll use
quantile
to get these specific values of the computed densities.Visualize and compare
I included histogram and density layers for the original data as well for comparison. You can see that the median and raw densities are very close with 1000 bootstrap samples.