I am relying on the compareGroups package to do some comparisons after a pipe-chain. When subsetting the final results, the call to [ triggers a call to update (both in their bespoke compareGroups-versions) which leads to a scoping problem.
Try this:
library(tidyverse)
# install.packages("compareGroups")
library(compareGroups)
get_data <- function() return(mtcars)
assign_group <- function(df) {
n <- nrow(df)
df$group <- rbinom(n, 1, 0.5)
return(df)
}
get_results <- function(){
get_data() %>% assign_group %>% compareGroups(group ~ ., data = .)
}
res <- get_results()
# all the above works, but the following triggers the error:
res["mpg"]
This leads to the following error:
Error in compareGroups(formula = group ~ mpg, data = .) : object '.' not found
The relevant (abbreviated) traceback is this:
compareGroups(formula = group ~ mpg, data = .)
eval(call, parent.frame())
update.compareGroups(x, formula = group ~ mpg)
update(x, formula = group ~ mpg) at <text>#1
eval(parse(text = cmd))
`[.compareGroups`(res, "mpg")
res["mpg"]
So, my understanding is that that the dot-notation in the dplyr pipe-chain prevents the update-call to find the dataframe, which is stored as . in the call. So, the error makes sense as neither . is not the name of the dataframe, nor available outside of the scope of the function get_results (though the main issue is the .). One obvious way of avoiding this error is by fixing the update.compareGroups function - I don't think we need another call to the package to redo all calculations when I simply want to retrieve individual results (which have already been calculated).
However, this is a more general issue with the . notation of dplyr and the fact it is stored in the call. This problem seems general enough so that I would imagine someone has encountered it before, and has found a more general solution?
Firstly, I don't think piping your data into
compareGroupsmakes sense - remember that piping means the first argument tocompareGroups()is now the dataframe, even though the function specification is:Secondly, this dplyr vignette shows you can use
.datainstead of just.to access the piped data. However, in this case the following will cause a crash giving messagedata argument will be ignored since formula is already a data set(due to the data being piped into first argument).Making a separate call to
compareGroupswithout piping then gets me into an unholy mess of environments wherebyresdoes not have access to the data when requestingres['mpg']outside the functionget_results(), as you already alluded to with the scoping problem. I think this is acompareGroupsproblem, because if I use the same architecture withglmthere's no such problem. So best I can do is to take the dataframe out of the function environment, which I think doesn't properly answer your question:But I hope the first two points I made get you closer to an answer.