Perform multiple Kruskal tests in for loop

64 views Asked by At

I want to perform a kruskal_test for multiple variables. I have been able to do this with anova tests as follows:

vars = c("Sepal.Length" , "Sepal.Width" ,"Petal.Length" ,  "Petal.Width")

for (i in vars) {
  
  f <- reformulate(paste(i,'~Species'))
  test = aov(data = iris, f )
  
  print(i)
  print(summary(test))

But I can't seem to use the same method for rstatix::kruskal_test or kruskal.test:

for (i in vars) {
  
  f <- reformulate(paste(i,'~Species'))
  test = kruskal.test(data = iris,formula= f )
  test = kruskal_test(data = iris,formula= f )
  print(i)
  print(summary(test))
  
}

I get errors: Error in kruskal.test.default(data = iris, formula = f) : argument "x" is missing, with no default

or

Error in kruskal.test.formula(formula, data = data, ...) : 'formula' missing or incorrect

1

There are 1 answers

0
jay.sf On BEST ANSWER

In aov you need this kind of formula,

> reformulate(paste('Petal.Width','~Species'))
~(Petal.Width ~ Species)

whereas in kruskal.test, it's

> reformulate('Petal.Width', 'Species')
Species ~ Petal.Width

The generic function appears to have an unexpected issue in its method dispatch mechanism, indicating a possible bug in its method selection process (R 4.3.2). Even though the arguments are named, the order matters (default order is kruskal.test(formula, data), we have kruskal.test(data, formula)):

> kruskal.test(formula=Petal.Width ~ Species, data=iris)

    Kruskal-Wallis rank sum test

data:  Petal.Width by Species
Kruskal-Wallis chi-squared = 131.19, df = 2, p-value < 2.2e-16

However:

> kruskal.test(data=iris, formula=Petal.Width ~ Species)
Error in kruskal.test.default(data = iris, formula = Petal.Width ~ Species) : 
  argument "x" is missing, with no default

It works if you use the method directly

> for (i in vars) {
+   f <- reformulate(i, 'Species')
+   test <- stats:::kruskal.test.formula(data=iris, formula=f)
+   print(test)
+ }

    Kruskal-Wallis rank sum test

data:  Species by Sepal.Length
Kruskal-Wallis chi-squared = 107.29, df = 34, p-value = 1.596e-09


    Kruskal-Wallis rank sum test

data:  Species by Sepal.Width
Kruskal-Wallis chi-squared = 47.879, df = 22, p-value = 0.001125
...

Better use lapply though:

> lapply(vars, \(v) stats:::kruskal.test.formula(data=iris, formula=reformulate(v, 'Species')))
[[1]]

    Kruskal-Wallis rank sum test

data:  Species by Sepal.Length
Kruskal-Wallis chi-squared = 107.29, df = 34, p-value = 1.596e-09


[[2]]

    Kruskal-Wallis rank sum test

data:  Species by Sepal.Width
Kruskal-Wallis chi-squared = 47.879, df = 22, p-value = 0.001125
...