I have a dataframe with 37 columns in it, with a representative sample, df below
df <- structure(list(irm = 201201:201202, trans11 = c(379L, 433L),
trans12 = 4:3, trans13 = 5:4, trans14 = c(13L, 3L), trans15 = c(29L,
21L), trans16 = c(0L, 0L), trans21 = c(6L, 4L), trans22 = 2:1,
trans23 = c(0L, 0L), trans24 = 0:1, trans25 = c(0L, 0L),
trans26 = c(0L, 0L), trans31 = c(2L, 2L), trans32 = c(0L,
0L), trans33 = 5:6, trans34 = c(0L, 0L), trans35 = c(7L,
2L), trans36 = c(0L, 0L), trans41 = c(4L, 10L), trans42 = c(0L,
0L), trans43 = c(0L, 0L), trans44 = c(4L, 10L), trans45 = c(3L,
1L), trans46 = c(0L, 0L), trans51 = c(15L, 18L), trans52 = c(0L,
0L), trans53 = c(1L, 1L), trans54 = c(4L, 0L), trans55 = c(96L,
115L), trans56 = c(0L, 0L), trans61 = c(0L, 0L), trans62 = c(0L,
0L), trans63 = c(0L, 0L), trans64 = c(0L, 0L), trans65 = c(0L,
0L), trans66 = c(0L, 0L)), row.names = c(NA, -2L), class = c("data.table",
"data.frame"))
The dataframe has 37 columns: trans11...trans16...trans61...trans66 plus the irm column for month.
What I would like to do is the following:
for each row/column entry in
trans11all the way throughtrans66, calculate the proportion of the entry relative to the sum of all other columns with the same prefix (e.g.trans1). So for the example here, the first row in entries 2 through 7 would be:(0.8813953, 0.009302326, 0.01162791, 0.03023256, 0.06744186, 0)for columns 2 through 7 (since we need to sumtrans11....trans16)How would I do this for all 36 columns in the larger df?
Is there a way to do this with group_by and starts_with from dplyr? I know a for loop is probably possible but any and all suggestions are welcome.
Thanks
Updated example with larger dataframe example
in base R use
prop.table/propotions:If you need to select using
transname:Edit
With the edit of the data, use: