I have a dataframe that is broken down by the gender of an author, their role in a project, and an identifier (PMID) (see below).
What I need is to create a 2x2 contingency table so I can calculate an odds ratio for the association between being a female first author and having a female senior author. To get that, I need to calculate the following:
- A: Number of times first author is female AND senior author is female
- B: Number of times first author is female AND senior author is male
- C: Number of times first author is male AND senior author is male
- D: Number of times first author is male AND senior author is female (and obviously drop cases where there is only a senior or only a first author per PMID)
I have grouped the table (see below) by PMID, so I really just need to figure out how to count each instance of the above. Having a hard time, would greatly appreciate any help!
# A tibble: 178,056 x 3
# Groups: pmid [101,907]
gender authorship pmid
<chr> <chr> <chr>
1 male First 18958667
2 male Senior 18958667
3 male First 18958651
4 male First 18751818
5 male Senior 18751818
6 male First 18751811
7 male Senior 18751811
8 female First 18751810
9 female Senior 18751810
10 male First 18088800
11 male Senior 18088800
12 male First 17710072
13 female First 17977216
14 male Senior 17762065
15 male First 17611457
16 male First 17611433
17 male First 17532688
18 male Senior 17532688
19 female First 17405310
20 male Senior 17386862
21 female First 17319096
22 male Senior 17319096
23 female First 17300028
24 male First 17282480
25 female First 17177771
26 male First 17124681
27 female First 17093906
28 female First 17042011
29 male Senior 17042011
30 female First 17042010
31 male Senior 17042010
32 female First 17042006
33 male Senior 17042006
34 female First 17042003
35 female First 17042002
36 male Senior 17042002
37 male First 17042001
38 female First 17041999
39 male Senior 17041997
40 female First 17041995
41 female First 17041994
42 female First 17041993
43 female Senior 17041993
44 female First 17041992
45 female Senior 17041992
46 female First 17041991
47 male First 17041990
48 male Senior 17041990
49 male First 17041989
50 male Senior 17041989
Tidy solution with golf-coding: