I am trying to understand:
1 - why p-value is calculated as it is in Games Howell test, and
2 - how to modify an R function to obtain a one-sided (alternative: greater) p-value for the test
When it comes to calculating the p-value for the test, most R functions I can find (here, and here) calculate it as such:
p <- ptukey(t * sqrt(2), groups, df, lower.tail = FALSE)
With:
- t: Absolute value of Welch's t-statistic
- groups: number of groups being compared
- df: degrees of freedom, estimated using Welch's correction
- sqrt(2): q=t*sqrt(2), with q being Tukey's test
I cannot find a manual or a paper that explains the math behind that p-value. For example, why the p-value for this two-sided test is calculated like it is and not like this:
p <- 2 * ptukey(t * sqrt(2), groups, df, lower.tail = FALSE)
In a two-sided test, shouldn't the p-value be the probability that I see a value more extreme than q (t*sqrt(2)) in the specified Tukey distribution? I would expect this computation to be done by finding the probability of obtaining values more extreme than |t|*sqrt(2) and multiplying that value by 2 since since Tukey's distribution is symmetric. Have I become too accustomed to Normal distributions and I'm missing something major?
Furthermore, I would be truly grateful if someone could point me in the right direction to modify the following R function to calculate a "one-sided" version of Games-Howell's test (specifically I'm interested in implementing the alternative "mean of group B > mean of group A"):
(Function taken from: https://rpubs.com/aaronsc32/games-howell-test )
games.howell <- function(grp, obs) {
#Create combinations
combs <- combn(unique(grp), 2)
# Statistics that will be used throughout the calculations:
# n = sample size of each group
# groups = number of groups in data
# Mean = means of each group sample
# std = variance of each group sample
n <- tapply(obs, grp, length)
groups <- length(tapply(obs, grp, length))
Mean <- tapply(obs, grp, mean)
std <- tapply(obs, grp, var)
statistics <- lapply(1:ncol(combs), function(x) {
mean.diff <- Mean[combs[2,x]] - Mean[combs[1,x]]
#t-values
t <- abs(Mean[combs[1,x]] - Mean[combs[2,x]]) / sqrt((std[combs[1,x]] / n[combs[1,x]]) + (std[combs[2,x]] / n[combs[2,x]]))
# Degrees of Freedom
df <- (std[combs[1,x]] / n[combs[1,x]] + std[combs[2,x]] / n[combs[2,x]])^2 / # Numerator Degrees of Freedom
((std[combs[1,x]] / n[combs[1,x]])^2 / (n[combs[1,x]] - 1) + # Part 1 of Denominator Degrees of Freedom
(std[combs[2,x]] / n[combs[2,x]])^2 / (n[combs[2,x]] - 1)) # Part 2 of Denominator Degrees of Freedom
#p-values
p <- ptukey(t * sqrt(2), groups, df, lower.tail = FALSE)
# Sigma standard error
se <- sqrt(0.5 * (std[combs[1,x]] / n[combs[1,x]] + std[combs[2,x]] / n[combs[2,x]]))
# Upper Confidence Limit
upper.conf <- lapply(1:ncol(combs), function(x) {
mean.diff + qtukey(p = 0.95, nmeans = groups, df = df) * se
})[[1]]
# Lower Confidence Limit
lower.conf <- lapply(1:ncol(combs), function(x) {
mean.diff - qtukey(p = 0.95, nmeans = groups, df = df) * se
})[[1]]
# Group Combinations
grp.comb <- paste(combs[1,x], ':', combs[2,x])
# Collect all statistics into list
stats <- list(grp.comb, mean.diff, se, t, df, p, upper.conf, lower.conf)
})
# Unlist statistics collected earlier
stats.unlisted <- lapply(statistics, function(x) {
unlist(x)
})
# Create dataframe from flattened list
results <- data.frame(matrix(unlist(stats.unlisted), nrow = length(stats.unlisted), byrow=TRUE))
# Select columns set as factors that should be numeric and change with as.numeric
results[c(2, 3:ncol(results))] <- round(as.numeric(as.matrix(results[c(2, 3:ncol(results))])), digits = 3)
# Rename data frame columns
colnames(results) <- c('groups', 'Mean Difference', 'Standard Error', 't', 'df', 'p', 'upper limit', 'lower limit')
return(results)
}
Any help or suggestion is immensely appreciated.
I checked Games-Howell original paper, but they talked only about the two-tailed test and said that the test should be performed as implemented in the R functions I quoted above, with no further justification or explanation (it's probably obvious to a Statistician that that's the right way to do it, it is not obvious to me unfortunately).