How to apply Chisq.test on levels of different categorical variables?

1.2k views Asked by At

I want to perform chisq.test() on each level of the categorical variable.

Currently, I have managed to do it on each categorical variable using below code.

# Random generation of values for categorical data
set.seed(12)
x <- data.frame(col1 = sample( LETTERS[1:4], 100, replace=TRUE ), 
                col2 = sample( LETTERS[3:6], 100, replace=TRUE ),
                col3 = sample( LETTERS[2:5], 100, replace=TRUE ),
                out = sample(c(1,2),100, replace=TRUE))

# performing chisq.test
pval <- as.data.frame(sapply(c(1:3),function(i)chisq.test(x[,i],x[,'out'])$p.value ))

#output
    p.value
1 0.33019256
2 0.08523487
3 0.79403367

I am interested to compare the levels at different outcomes.

# for col1 levels different outcomes
table(x$col1,x$out)

#output
     1  2
  A  8 12
  B 18 10
  C 12 11
  D 18 11

For example, to compare level B in col1 for different outcomes 1,2 in out.

I would like to know how can this be extended(or in another smart way) to each level of a categorical variable ?

# Expected output
       p.value

col1.A  *****
col1.B  *****
col1.C  *****
.
.
.
col3.E  *****

Thanks for your attention.

1

There are 1 answers

1
Nick Kennedy On BEST ANSWER

This is how you would do it if you wanted to do a Chi-squared test for given probabilities (with p = rep(0.5, 2)).

I've broken this down to make it easier to understand:

getP <- function(lev, x, i) {
  tab <- table(x$out[x[, i] == lev])
  chisq.test(tab)$p.value
}
pvalList <- lapply(1:3, function(i) {
  df <- data.frame(Column = i, Category = levels(x[, i]))
  df$p.value <- sapply(df$Category, getP, x, i)
  df
})
pval <- do.call("rbind", pvalList) # Convert to single data frame

Alternatively, if what you want is actually A vs not A, B vs not B, etc., you could substitute the definition of getP with:

getP <- function(lev, x, i) {
  tab <- table(x$out, x[, i] == lev)
  chisq.test(tab)$p.value
}