Two-way contingency table in R

Question

Two-way contingency table in R

846 views Asked by Pawel888 At 03 September 2017 at 18:09

I have a dataframe and I want to output a two-way contingency table from two of the columns. They both have values "Too Little", "About Right" or "Too Much".

I'm typing

df %>%
  filter(!is.na(col1)) %>%
  group_by(col1) %>%
  summarise(count = n())

for both of them separately and get something like this:

col1        count
<fctr>      <int>
Too Little  19259           
About Right 9539            
Too Much    2816

What I would like to achieve is this:

       Too Little   About Right   Too Much   Total
col1   19259        9539          2816       31614
col2   20619        9374          2262       32255
Total  39878       18913          5078       63869

I've been trying to use table function

addmargins(table(df$col1, df$col2))

But the result is not what I want

              Too Little About Right Too Much   Sum
  Too Little       13770        4424      740 18934
  About Right       4901        3706      700  9307
  Too Much          1250         800      679  2729
  Sum              19921        8930     2119 30970

Original Q&A

There are 2 answers

**lukeA** · Answer 1 · 2017-09-03T19:12:19+00:00

I'd give tabulate a try, which is the foundation for table (see ?tabulate). For example given

set.seed(123)
vals <- LETTERS[1:3]
df <- as.data.frame(replicate(3, sample(vals, 5, T)))
df <- data.frame(lapply(df, "levels<-", vals))

then you could do

m <- t(sapply(df, tabulate, nbins = length(vals)))
colnames(m) <- vals
addmargins(m)
#     A B C Sum
# V1  1 1 3   5
# V2  1 3 1   5
# V3  1 2 2   5
# Sum 3 6 6  15

Or (via @thelatemail) just

addmargins(t(sapply(df, table)))
#     A B C Sum
# V1  1 1 3   5
# V2  1 3 1   5
# V3  1 2 2   5
# Sum 3 6 6  15

**zx8754** · Answer 2 · 2017-09-03T19:36:23+00:00

We can use table in a loop then rbind:

# Using dummy data from @lukeA's answer

addmargins(do.call(rbind, lapply(df1, table)))
#     A B C Sum
# V1  1 1 3   5
# V2  1 3 1   5
# V3  1 2 2   5
# Sum 3 6 6  15

Benchmarking

# bigger data
set.seed(123)
vals <- LETTERS[1:20]
df1 <- as.data.frame(replicate(20, sample(vals, 100000, T)))
df1 <- data.frame(lapply(df1, "levels<-", vals))


microbenchmark::microbenchmark(
  lukeA = {
    m1 <- t(sapply(df1, tabulate, nbins = length(vals)))
    colnames(m1) <- vals
    m1 <- addmargins(m1)
  },
  # as vals only used for luke's solution, keep it in.
  lukeA_1 = {
    vals <- LETTERS[1:20]
    m2 <- t(sapply(df1, tabulate, nbins = length(vals)))
    colnames(m2) <- vals
    m2 <- addmargins(m2)
  },
  thelatemail = {m3 <- addmargins(t(sapply(df1, table)))}, 
  zx8754 = {m4 <- addmargins(do.call(rbind, lapply(df1, table)))}
)
# Unit: milliseconds
#        expr       min        lq      mean    median        uq        max neval
#       lukeA  2.349969  2.371922  2.518447  2.473839  2.558653   3.363738   100
#     lukeA_1  2.351680  2.377196  2.523473  2.473839  2.542831   3.459242   100
# thelatemail 38.316506 42.054136 43.785777 42.674912 44.234193  90.287809   100
#      zx8754 38.695101 41.979728 44.933602 42.762006 44.244314 110.834292   100

TechQA.

Two-way contingency table in R

There are 2 answers

Related Questions in R

Related Questions in TWO-WAY

Related Questions in CONTINGENCY

Popular Questions

Popular Tags

Trending Questions