calculate totals in each column and then run a fishers test in R

64 views Asked by At

Data:

variant disease control total
A1         1      53    54
A2         6      2     8
A3         15     37    52
A4         0      53    53
A5         65     4     69
A6         4      5     9
A7         3      34    37

I would like to add a row at the bottom with column totals for the disease and control ones and then run a fishers per row adding another column with p-values from the test.

Desired outcome (p-values made up):

variant disease control total p-value
A1         1      53    54    0.001
A2         6      2     8     0.6921
A3         15     37    52    1
A4         0      53    53    0.98
A5         65     4     69    0.68
A6         4      5     9     0.63
A7         3      34    37    0.832
C_total    94     188

I've tried:

rbind(df, colSums(df[,2:3]), fill=TRUE) 

But this give me all the column totals in the final two columns

Not sure about the Fishers yet but imagine some form of apply function using per row and per total to create a 2x2 table.

Many thanks

2

There are 2 answers

4
tmfmnk On BEST ANSWER

One dplyr and tibble solution could be:

df %>%
 add_row(variant = "Total", !!!colSums(df[-1])) %>%
 rowwise() %>%
 mutate(p_value = chisq.test(c_across(c(disease, control)), p = c(0.5, 0.5))$p.value)

  variant disease control total  p_value
  <chr>     <dbl>   <dbl> <dbl>    <dbl>
1 A1            1      53    54 1.48e-12
2 A2            6       2     8 1.57e- 1
3 A3           15      37    52 2.28e- 3
4 A4            0      53    53 3.34e-13
5 A5           65       4    69 2.08e-13
6 A6            4       5     9 7.39e- 1
7 A7            3      34    37 3.46e- 7
8 Total        94     188   282 2.17e- 8

And as I suppose you try to compare whether the count of individuals between the two groups is the same, a chi-square goodness of fit test could be used.

0
r2evans On

For the first of your questions:

rbind(df, rbind(colSums(df[,2:3])), fill = TRUE)[ (.N == seq_len(.N)), variant := "Total"][]
#    variant disease control total p-value
# 1:      A1       1      53    54  0.0010
# 2:      A2       6       2     8  0.6921
# 3:      A3      15      37    52  1.0000
# 4:      A4       0      53    53  0.9800
# 5:      A5      65       4    69  0.6800
# 6:      A6       4       5     9  0.6300
# 7:      A7       3      34    37  0.8320
# 8:   Total      94     188    NA      NA