I have one long dataset that is composed of several datasets resulting from multiple imputations (let's say 10 imputations). They have an id variable identifying the imputation. On each of these imputed datasets I would like to bootstrap 10 datasets. After the bootstrap, I want to run models on each (100, imputation bootstrap combinations).
In this example I am not sure whether to use the broom::bootstrap()
function or the modelr::bootstrap()
function. Furthermore, the grouping seems to be lost in my pipeline.
Here is a reproducible example using the mtcars dataset:
library(tidyverse)
library(broom)
cars <- mtcars %>%
mutate(am = as.factor(am)) %>% # This is standing in for my imputation id variable
group_by(am)
Source: local data frame [32 x 11]
Groups: am [2]
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <dbl> <dbl>
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
As you can see the output is currently showing that there are two groups, as it should. In my dataset it would show there are 10, for each imputed dataset. Now:
cars2 <- cars %>%
broom::bootstrap(10, by_group = TRUE)
cars2
Source: local data frame [32 x 11]
Groups: replicate [10]
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <dbl> <dbl>
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Now it looks as though there are only 10 groups representing each replicate. It didn't seem to preserve the prior grouping. At this point I would expect 20 total groups (2 x 10).
If I now do this:
cars3 <- cars2 %>%
group_by(am)
cars3
Source: local data frame [32 x 11]
Groups: am [2]
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <dbl> <dbl>
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Now it seems like there are no replicates only groups for am
.
Is there anyway to do the bootstrapping after i've grouped my original dataset. Also, ideally, after I bootstrap there should be an id that indicates which bootstrapped dataset i'm looking at.
In my ideal world my code should be able to do something like this:
cars <- mtcars %>%
mutate(am = as.factor(am)) %>%
group_by(am) %>%
bootstrap(10, by_group = TRUE) %>%
nest() %>% # create a condensed tidy dataset that has one row per imputation, bootstrap combo
mutate(model = map(data, ~lm(mpg~, data = .)) # Create a model for each row
I'm in the midst of trying to learn both
modelr
andpurrr
and they're really giving me a headache. I think I finally figured this one out though.Group the dataframe, then within each group, create 10 nested bootstrap replicates
Regroup and unnest to expand into 2 columns for the bootstraps and an id
Group down to the lowest level of replication and create models
You have to use
as.data.frame
on the strap column to re-expand it to usable data. See?resample
. This one took me forever to figure out. It should just work liketidyr::unnest
.Call your function/summary on each model
Visualize
Note that I upped the number of bootstraps to 1000, takes about 10s.