Kruskal Wallis Test and subsetting

2.3k views Asked by At

Are you please able to assist in performing a Krustal Wallis test using a subset of my data? I would like to be able to test for differences in "N" between "Producers".

names(Isotope.Data)
[1] "Species"         "Name"            "Group"           "Simple_Group"       "Trophic_Group"  
[6] "Sample"          "N"               "C" 

In my csv.file I have a column "Trophic Group" which separates Consumers and Producers.

table(Isotope.Data$Trophic_Group)

Consumer Producers  
    61         18 

Under the column heading Simple_Group, I have three Producers - Rhodophyta, Seagrass and Phaeophyceae

table(Isotope.Data$Simple_Group)

 Abalone  Loliginidae      Octopus Phaeophyceae   Rhodophyta     Seagrass      Teleost 
      24            2           12            6            9            3           20 
Tunicate 
       3 

I have tried numerous things, but I get various error messages. Would anyone be able to improve on the following code?

kruskal.test(C ~ Simple_Group, data = Isotope.Data, subset = Isotope.Data$Trophic_Group = "Producers") 

P.S. I have created a separate CSV.file which only includes Primary Producers. However a subsequent Dunn-test of multiple comparisons, used to determine which levels differed from each other provides different significance levels to those which includes both Consumers and Producers.

2

There are 2 answers

0
RadRel On

You can also use the map() function from the package purrr to apply function in each group once splited

library(purrr)
test <- df %>% group_split(phase) %>% map(~kruskal.test(.,val ~ distance))
test
0
maycca On

Will maybe this answer be helpful? Based on @user295691 answer:

Kruskal-Wallis test: create lapply function to subset data.frame?

Here you identify individual groups what you want to test differences between, and use split, to correctly define subsetting of your data frame.

Dummy example:

# create data
val<-runif(60, min = 0, max = 100)
distance<-floor(runif(60, min=1, max=3))
phase<-rep(c("a", "b", "c"), 20)

df<-data.frame(val, distance, phase)

# get unique groups
ii<-unique(df$phase)

# run Kruskal test, specify the subset
kruskal.test(df$val ~df$distance,
             subset = phase == "c")

And now apply the kruskal.test to each group using split:

lapply(split(df, df$phase), function(d) { kruskal.test(val ~ distance, data=d) })

or create a function:

lapply(ii, function(i) { kruskal.test(df$val ~ df$distance, subset=df$phase==i )})

Both produces test results for each group:

[[1]]

    Kruskal-Wallis rank sum test

data:  df$val by df$distance
Kruskal-Wallis chi-squared = 0.14881, df = 1, p-value = 0.6997


[[2]]

    Kruskal-Wallis rank sum test

data:  df$val by df$distance
Kruskal-Wallis chi-squared = 0.11688, df = 1, p-value = 0.7324


[[3]]

    Kruskal-Wallis rank sum test

data:  df$val by df$distance
Kruskal-Wallis chi-squared = 0.0059524, df = 1, p-value = 0.9385

Or just get the p-values (notice the addition of $p.value after the kruskal.test):

lapply(ii, function(i) { 
  kruskal.test(df$val ~ df$distance, 
               subset=df$phase==i )$p.value
}
  )