How do I obtain a one-tailed p-value for Games-Howell's test (as calculated by most R functions)?

55 views Asked by At

I am trying to understand:

1 - why p-value is calculated as it is in Games Howell test, and

2 - how to modify an R function to obtain a one-sided (alternative: greater) p-value for the test

When it comes to calculating the p-value for the test, most R functions I can find (here, and here) calculate it as such:

p <- ptukey(t * sqrt(2), groups, df, lower.tail = FALSE)

With:

  • t: Absolute value of Welch's t-statistic
  • groups: number of groups being compared
  • df: degrees of freedom, estimated using Welch's correction
  • sqrt(2): q=t*sqrt(2), with q being Tukey's test

I cannot find a manual or a paper that explains the math behind that p-value. For example, why the p-value for this two-sided test is calculated like it is and not like this:

p <- 2 * ptukey(t * sqrt(2), groups, df, lower.tail = FALSE)

In a two-sided test, shouldn't the p-value be the probability that I see a value more extreme than q (t*sqrt(2)) in the specified Tukey distribution? I would expect this computation to be done by finding the probability of obtaining values more extreme than |t|*sqrt(2) and multiplying that value by 2 since since Tukey's distribution is symmetric. Have I become too accustomed to Normal distributions and I'm missing something major?

Furthermore, I would be truly grateful if someone could point me in the right direction to modify the following R function to calculate a "one-sided" version of Games-Howell's test (specifically I'm interested in implementing the alternative "mean of group B > mean of group A"):

(Function taken from: https://rpubs.com/aaronsc32/games-howell-test )

games.howell <- function(grp, obs) {
  
  #Create combinations
  combs <- combn(unique(grp), 2)
  
  # Statistics that will be used throughout the calculations:
  # n = sample size of each group
  # groups = number of groups in data
  # Mean = means of each group sample
  # std = variance of each group sample
  n <- tapply(obs, grp, length)
  groups <- length(tapply(obs, grp, length))
  Mean <- tapply(obs, grp, mean)
  std <- tapply(obs, grp, var)
  
  statistics <- lapply(1:ncol(combs), function(x) {
    
    mean.diff <- Mean[combs[2,x]] - Mean[combs[1,x]]
    
    #t-values
    t <- abs(Mean[combs[1,x]] - Mean[combs[2,x]]) / sqrt((std[combs[1,x]] / n[combs[1,x]]) + (std[combs[2,x]] / n[combs[2,x]]))
    
    # Degrees of Freedom
    df <- (std[combs[1,x]] / n[combs[1,x]] + std[combs[2,x]] / n[combs[2,x]])^2 / # Numerator Degrees of Freedom
            ((std[combs[1,x]] / n[combs[1,x]])^2 / (n[combs[1,x]] - 1) + # Part 1 of Denominator Degrees of Freedom 
              (std[combs[2,x]] / n[combs[2,x]])^2 / (n[combs[2,x]] - 1)) # Part 2 of Denominator Degrees of Freedom
    
    #p-values
    p <- ptukey(t * sqrt(2), groups, df, lower.tail = FALSE)
    
    # Sigma standard error
    se <- sqrt(0.5 * (std[combs[1,x]] / n[combs[1,x]] + std[combs[2,x]] / n[combs[2,x]]))
          
    # Upper Confidence Limit
    upper.conf <- lapply(1:ncol(combs), function(x) {
      mean.diff + qtukey(p = 0.95, nmeans = groups, df = df) * se
    })[[1]]
    
    # Lower Confidence Limit
    lower.conf <- lapply(1:ncol(combs), function(x) {
      mean.diff - qtukey(p = 0.95, nmeans = groups, df = df) * se
    })[[1]]
    
    # Group Combinations
    grp.comb <- paste(combs[1,x], ':', combs[2,x])
    
    # Collect all statistics into list
    stats <- list(grp.comb, mean.diff, se, t, df, p, upper.conf, lower.conf)
  })
  
  # Unlist statistics collected earlier
  stats.unlisted <- lapply(statistics, function(x) {
    unlist(x)
  })
  
  # Create dataframe from flattened list
  results <- data.frame(matrix(unlist(stats.unlisted), nrow = length(stats.unlisted), byrow=TRUE))
  
  # Select columns set as factors that should be numeric and change with as.numeric
  results[c(2, 3:ncol(results))] <- round(as.numeric(as.matrix(results[c(2, 3:ncol(results))])), digits = 3)
  
  # Rename data frame columns
  colnames(results) <- c('groups', 'Mean Difference', 'Standard Error', 't', 'df', 'p', 'upper limit', 'lower limit')

  return(results)
}

Any help or suggestion is immensely appreciated.

I checked Games-Howell original paper, but they talked only about the two-tailed test and said that the test should be performed as implemented in the R functions I quoted above, with no further justification or explanation (it's probably obvious to a Statistician that that's the right way to do it, it is not obvious to me unfortunately).

0

There are 0 answers