Building contingency table

Question

Building contingency table

320 views Asked by Sos At 29 August 2017 at 11:58

I have a table like this:

df <- data.frame(P1 = c(1,0,0,0,0,0,"A"),
                  P2 = c(0,-2,1,2,1,0,"A"),
                  P3 = c(-1,2,0,2,1,0,"B"),
                  P4 = c(2,0,-1,0,-1,0,"B"),
                  Names = c("G1","G2","G3","G1","G2","G3","Group"),
                  stringsAsFactors = FALSE)

Which becomes

Names    P1   P2    P3   P4
G1       1    0     -1   2
G2       0    -2    2    0
G3       0    1     0    -1
G1       0    2     2    0
G2       0    1     1    -1
G3       0    0     0    0
Group    A    A     B    B

Here, A and B are grouping variables for P1, P2, P3, P4.

I want to build a contingency for Ids (G1, G2...), Group (A,B), and Var (-2,-1,0,1,2) table such as:

Id    Group Var    Count
G1    A     -2     0
G1    A     -1     0
G1    A     0      1
G1    A     1      1
G1    A     2      0
G1    B     -2     0
G1    B     -1     1
G1    B     0      0
G1    B     1      0
G1    B     2      1
G2    A     -2     1
G2    A     -1     0
G2    A     0      1
...

Is there a way to do it in R without using lots of loops?

Original Q&A

There are 2 answers

mdag02 On 29 August 2017 at 13:31

With

library(tidyverse)

df <- read.table(text="Id       P1   P2   P3    P4   
G1     1    0    -1    2 
G2     0    -2   2     0 
G3     0    1    0     -1
G1     0    2    2     0 
G2     0    1    1     -1 
G3     0    0    0     0", header=TRUE, stringsAsFactors = FALSE)

We reshape the table and recode the P* variables in group. We then count and complete the missing cases. Resulting in :

df %>%
  gather(P1, P2, P3, P4, key = "p", value = "v") %>% 
  mutate(group = ifelse(p %in% c("P1", "P2"), "A", "B")) %>% 
  group_by(Id, group, v) %>% 
  summarise(Count = n()) %>% 
  ungroup() %>% 
  complete(Id, group, v, fill = list("Count" = 0))

And if you don't need all combinations in the output, just use :

df %>%
  gather(P1, P2, P3, P4, key = "p", value = "v") %>% 
  mutate(group = ifelse(p %in% c("P1", "P2"), "A", "B")) %>% 
  group_by(Id, group, v) %>% 
  summarise(Count = n())

# A tibble: 17 x 4
# Groups:   Id, group [?]
      Id    group  v     Count
      <chr> <chr>  <int> <int>
 1    G1     A     0     2
 2    G1     A     1     1
 3    G1     A     2     1
 4    G1     B    -1     1
 5    G1     B     0     1
 6    G1     B     2     2
 7    G2     A    -2     1
 8    G2     A     0     2
 9    G2     A     1     1
10    G2     B    -1     1
11    G2     B     0     1
12    G2     B     1     1
13    G2     B     2     1
14    G3     A     0     3
15    G3     A     1     1
16    G3     B    -1     1
17    G3     B     0     3

**Jaap** · Accepted Answer · 2017-08-29T13:13:30+00:00

Assuming you want to group the P1 & P2 columns as A and the P3 & P4 columns as B, you could approach it as follows with the data.table-package:

library(data.table)
DT <- melt(melt(setDT(df),
                measure.vars = list(c(2,3),c(4,5)),
                value.name = c("A","B")),
           id = 1, measure.vars = 3:4, variable.name = 'group'
           )[order(Id,group)][, val2 := value]

DT[CJ(Id = Id, group = group, value = value, unique = TRUE)
   , on = .(Id, group, value)
   ][, .(counts = sum(!is.na(val2))), by = .(Id, group, value)]

which results in:

    Id group value counts
 1: G1     A    -2      0
 2: G1     A    -1      0
 3: G1     A     0      2
 4: G1     A     1      1
 5: G1     A     2      1
 6: G1     B    -2      0
 7: G1     B    -1      1
 8: G1     B     0      1
 9: G1     B     1      0
10: G1     B     2      2
11: G2     A    -2      1
12: G2     A    -1      0
13: G2     A     0      2
14: G2     A     1      1
15: G2     A     2      0
16: G2     B    -2      0
17: G2     B    -1      1
18: G2     B     0      1
19: G2     B     1      1
20: G2     B     2      1
21: G3     A    -2      0
22: G3     A    -1      0
23: G3     A     0      3
24: G3     A     1      1
25: G3     A     2      0
26: G3     B    -2      0
27: G3     B    -1      1
28: G3     B     0      3
29: G3     B     1      0
30: G3     B     2      0

Used data:

df <- read.table(text="Id       P1   P2   P3    P4   
G1     1    0    -1    2 
G2     0    -2   2     0 
G3     0    1    0     -1
G1     0    2    2     0 
G2     0    1    1     -1 
G3     0    0    0     0", header=TRUE, stringsAsFactors = FALSE)

Note that I omitted the 'Group'-row because you stated in the comments that these were just to indicated to which groups the P1 - P4 columns should belong.

TechQA.

Building contingency table

There are 2 answers

Related Questions in R

Related Questions in CONTINGENCY

Popular Questions

Popular Tags

Trending Questions