I am trying to perform multiple, independent t-tests on a large data frame. When I create a function to loop over to run the tests rstatix will not read the function variables as variables and input their value.
Example data
if(!require(rstatix)){install.packages("rstatix");library('rstatix')}
set.seed(1)
df <- data.frame(
Type = sprintf("Type_%s", rep.int(1:2, times = 10)),
Read = rnorm(20))
T-test
stat.test <- df %>%
t_test(Read ~ Type, paired = FALSE)
stat.test
Plot without statistics
ggplot(df, aes(x = Type, y = Read)) +
geom_boxplot(aes(fill= Type)) +
geom_dotplot(binaxis='y', stackdir='center', dotsize=1, binwidth = 1/30)
Example function (works fine!)
my.function <-
function(df, var1, var2) {
ggplot(df, aes_string(x = var1, y = var2)) +
geom_boxplot(aes_string(fill= var1)) +
geom_dotplot(binaxis='y', stackdir='center', dotsize=1, binwidth = 1/30)
}
my.function(df, 'Type', 'Read')
My issue
my.function <-
function(df, var1, var2) {
stat.test <- df %>%
t_test(var2 ~ var1, paired = FALSE)
ggplot(df, aes_string(x = var1, y = var2)) +
geom_boxplot(aes_string(fill= var1)) +
geom_dotplot(binaxis='y', stackdir='center', dotsize=1, binwidth = 1/30) +
stat_pvalue_manual(stat.test, label = "p", y.position = 2.1)
}
my.function(df, 'Type', 'Read')
The above returns an error because rstatix thinks var1
and var2
are columns in the example data frame.
I have tried the following to get R to stop the behavior but both attempts fail.
Attempt 1
my.function <-
function(df, var1, var2) {
stat.test <- df %>%
t_test(eval(parse(var2)) ~ eval(parse(var1)), paired = FALSE)
ggplot(df, aes_string(x = var1, y = var2)) +
geom_boxplot(aes_string(fill= var1)) +
geom_dotplot(binaxis='y', stackdir='center', dotsize=1, binwidth = 1/30) +
stat_pvalue_manual(stat.test, label = "p", y.position = 2.1)
}
my.function(df, 'Type', 'Read')
Attempt 2
my.function <-
function(df, var1, var2) {
stat.test <- df %>%
t_test(eval(as.name(paste(var2))) ~ eval(as.name(paste(var1))), paired = FALSE)
ggplot(df, aes_string(x = var1, y = var2)) +
geom_boxplot(aes_string(fill= var1)) +
geom_dotplot(binaxis='y', stackdir='center', dotsize=1, binwidth = 1/30) +
stat_pvalue_manual(stat.test, label = "p", y.position = 2.1)
}
my.function(df, 'Type', 'Read')
I went into the t_test function to see if there would be any indicators of why my attempts to get this custom function to run would fail. I suspected the issue had something to do with the way R handles formulas and functions. After a bit of manipulation of my original script, I finally got it working.