How to compute TukeyHSD letters with multcompLetters4 and associate them to the originl dataset by group

53 views Asked by Betty888 At 14 November 2023 at 00:09

I’m starting with R, and I’m facing an issue but I ignore its origin, so I come here to ask for your help and I hope you will be able to help me I have a dataset containing the following columns: Y = response trt : 1st explanatory variable lot : 2nd explanatory variable Location : is the column to group my data, so “aov” runs by Location. My objective is to extract Tukey-test goup letters output + Studentized residuals to flag outliers. For that , I’m using the following code :

library(multcomp)
library(multcompView)
library(dplyr)
library(tidyr)
library(na.tools)
library(tibble)

#My datasets and my variables
output <- as.data.frame(dataset)
output$response <- as.numeric(output$response)
output$trt <- as.factor(output$trt)
output$lot <- as.factor(output$lot)
output <- output %>% 
     group_by(location) %>% 
     mutate(row=row_number(),unique_trt = n_distinct(trt), unique_lot = n_distinct(lot), var = var(response, na.rm = TRUE))


# create a subset dataset where I use filter to exclude cases where one of the factors has only 1 level and where variance in data is null 
lsplit <-  output %>%
         filter( var != 0 & unique_trt != 1 & unique_lot != 1)
lsplit <- lsplit %>% mutate(response3 = replace_na(response, mean(response, na.rm = TRUE)))

# split my sub-dataset by "location"
lsplit <- split(lsplit, ~ location)
# Create a function of lm and TukeyHSD to apply by "Location" and combine the output to the original dataset "output"
all_results <- lapply(lsplit,function(x){
     model <- aov(reponse3 ~ trt + lot, data = x, na.rm=TRUE)
     tukey <- TukeyHSD(model)
     cld <- multcompLetters4(model, tukey)
     TK <- group_by(x,trt) %>% summarise (mean = mean(response3, na.rm=TRUE)) %>% arrange(desc(mean))
     cld <- as.data.frame.list(cld$trt)
     TK$cld <- cld$Letters
     x$stud_resid <- studres(model)
     x <- left_join(x, TK, by = "trt")
      x
  })
output <- left_join(output,bind_rows(all_results) %>% select(location, mean, stud_resid, row, cld) , by=c("location", "row"))

This code works perfectly for a sample of locations that I’m using for testing, but once I apply it to the whole dataset (I’m using it on a dataset of around 20 M rows in Power query), I get an error message and I don’t know from which location this comes, that’s why I can’t provide an example to reproduce the error. Here is the error message I get :

“ Error in vec2mat(x) : 36 NAs not allowed, found in x” Any help os most welcome Thank you in advance for your help !

Original Q&A

TechQA.

How to compute TukeyHSD letters with multcompLetters4 and associate them to the originl dataset by group

There are 0 answers

Related Questions in R

Related Questions in STATISTICS

Related Questions in ANOVA

Related Questions in TUKEY

Related Questions in MULTCOMPVIEW

Popular Questions

Popular Tags

Trending Questions