perform ttest on a data.frame

413 views Asked by At

Trying to perform ttest (and to get p.value) from a data.frame, there's one column that includes the groups (good vs bad) and the rest of the columns are numeric.

I generated a toy dataset here:

W <- rep(letters[seq( from = 1, to = 2)], 25)
X <- rnorm(n=50, mean = 10, sd = 5)
Y <- rnorm(n=50, mean = 15, sd = 6)
Z <- rnorm(n=50, mean = 20, sd = 5)
test_data <- data.frame(W, X, Y, Z)

Then I transform the data into long format:

melt_testdata <- melt(test_data)

And performed the t.test

lapply(unique(melt_testdata$variable),function(x){
  Good <- subset(melt_testdata, W  == 'a' & variable ==x)$variable
  Bad <- subset(melt_testdata, W == 'b' & variable ==x)$variable
  t.test(Good,Bad)$p.value
})

But I instead of getting the t.test results, I got the following error messages:

Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : 
  missing value where TRUE/FALSE needed In addition: Warning messages:
1: In mean.default(x) : argument is not numeric or logical: returning NA
2: In var(x) :
  Calling var(x) on a factor x is deprecated and will become an error.
  Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
3: In mean.default(y) : argument is not numeric or logical: returning NA
4: In var(y) :
  Calling var(x) on a factor x is deprecated and will become an error.
  Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.

Then I tried to write loops (first time..)

good <- matrix(,50)
bad <- matrix(,50)
cnt=3
out <- rep(0,cnt)


for (i in 2:4){
  good[i] <- subset(test_data, W == 'a', select= test_data[,i])
  bad[i] <- subset(test_data, W == 'b', select= test_data[,i])
  out[i] <- print(t.test(good[[i]], bad[[i]])$p.value)
}

Still not getting p.values ....... This is the error messages

Error in x[j] : only 0's may be mixed with negative subscripts

I appreciate any help in any method, thanks!

2

There are 2 answers

5
Benjamin On BEST ANSWER

I think you'll have better luck with the formula method of t.test. Try

library(broom)
library(magrittr)
library(dplyr)

W <- rep(letters[seq( from = 1, to = 2)], 25)
X <- rnorm(n=50, mean = 10, sd = 5)
Y <- rnorm(n=50, mean = 15, sd = 6)
Z <- rnorm(n=50, mean = 20, sd = 5)
test_data <- data.frame(W, X, Y, Z)

lapply(test_data[c("X", "Y", "Z")],
       function(x, y) t.test(x ~ y),
       y = test_data[["W"]]) %>% 
  lapply(tidy) %>% 
  do.call("rbind", .) %>% 
  mutate(variable = rownames(.))

Edit:

With stricter adherence to the dplyr philosophy, you can use the following: which is actually a bit cleaner looking.

library(broom)
library(dplyr)
library(tidyr)

W <- rep(letters[seq( from = 1, to = 2)], 25)
X <- rnorm(n=50, mean = 10, sd = 5)
Y <- rnorm(n=50, mean = 15, sd = 6)
Z <- rnorm(n=50, mean = 20, sd = 5)

test_data <- data.frame(W, X, Y, Z) 

test_data %>% 
  gather(variable, value, X:Z) %>% 
  group_by(variable) %>% 
  do(., tidy(t.test(value ~ W, data = .)))
2
Richard Telford On

Here is a solution using dplyr and the formula argument to t.test. do works on each group defined by the group_by. glance extracts values from the t.test output and makes them into a data.frame.

library(tidyverse)
library(broom)

melt_testdata %>% 
  group_by(variable) %>% 
  do(glance(t.test(value ~ W, data = .)))