Using rmultinom to estimate the distribution of holiday party gifts for a list of dataframes/estimates

42 views Asked by At

There is a holiday party with 6 people. Each of the 6 people have an estimate for their gifts_brought, and each person has an estimate for their chance of receiving another person's gift (gifts_received_pct). There is no limit to the number of gifts a person can receive from another person. They can only give/receive gifts from their own team, and they cannot gift themselves their own gifts that they brought.

My real problem has 1000 different iterations/estimates for the parameters gifts_brought and gifts_received_pct and I want estimates for the total gifts received for each person for all 1000 estimates. For the purpose of this exercise I'm going to make all the estimates the same, but I want to be clear that in reality all my dataframes have different estimates for the parameters which is why I can't just do rmultinom(1000, ...).

First build the dummy code.

name <- c('Aaron', 'Susie', 'Sam', 'Emma', 'Jennifer', 'Steve')
giftsBrought <- c(5, 3, 4, 2, 3, 6)
team <- c('Sales', 'Sales', 'Sales', 'IT', 'IT', 'IT')
gifts_received_pct <- c(.2, .3, .1, .2, .2, .1) # rmultinom does not require normilazation
giftsDF <- data.frame(name, team, giftsBrought, gifts_received_pct, stringsAsFactors = FALSE)

giftsEstimationList <- list()
for(i in 1:1000){
  giftsEstimationList[[i]] <- giftsDF
}

Next, this is how I would get the gifts_received calculation for just one of the dataframes:

giftsReceivedDF <- lapply(1:nrow(giftsDF), function(i){
  probs <- giftsDF
  probs$gifts_received_pct[probs$team != giftsDF$team[i] | probs$name == giftsDF$name[i]] <- 0 # set other_team_pct and own_pct to 0
  rmultinom(1, giftsDF$giftsBrought[i], probs$gifts_received_pct)
})

Reduce(`+`, giftsReceivedDF)

I believe this is correct - when reviewing giftsReceivedDF closely, it appears that no person ever receives their own gifts, and the other team doesn't receive any of their gifts.

Where I'm stumped is how to run this through all 1000 of the dataframes in giftsEstimationList in a timely fashion. I originally tried to brute force everything with a bunch of for-loops, but I don't believe that is going to be the most efficient and time is rather important here.

1

There are 1 answers

0
jblood94 On

A function to simulate the gift exchange. It is vectorized over all teams and iterations. It only has to loop over the maximum number of participants (3 in the example data).

library(matrixStats) # for `rowCumsums` and `colCumsums`
library(data.table)

f <- function(giftsEstimationList) {
  # combine the list into a single table
  dt <- rbindlist(giftsEstimationList, TRUE, FALSE, "iter")
  # get the size of each exchange group (a team within an iteration)
  n <- dt[,.N, .(iter, team)][[3]]
  maxcol <- max(n) # the maximum sized exchange group
  # a matrix of relative probabilities of the destination probabilities of
  # each participant's gifts (row 1, column 3 = probability that a gift from
  # Aaron will go to Sam in iteration 1; row 11, column 1 = probability that
  # Jennifer's gift will go to Emma in iteration 2)
  m <- matrix( 
    unlist(
      dt[
        ,.(.(c(gifts_received_pct*(1 - diag(maxcol)),
               numeric(.N*(maxcol - .N))))),
        .(iter, team)
      ][[3]]
    ),
    nrow(dt), maxcol, 1
  )
  # get the probability to use in `rbinom`
  # (see https://en.wikipedia.org/wiki/Multinomial_distribution#Sampling_using_repeated_conditional_binomial_samples)
  m[,2:maxcol] <- m[,2:maxcol]/rowCumsums(m)[,2:maxcol]
  giftsRemaining <- dt$giftsBrought
  # distribute the gifts
  for (j in maxcol:2) {
    m[,j] <- rbinom(nrow(dt), giftsRemaining, m[,j])
    giftsRemaining <- giftsRemaining - m[,j]
  }
  
  m[,1] <- giftsRemaining
  # aggregate the gifts received for each participant
  dt[
    ,giftsReceived := diff(rbind(0, colCumsums(m)[cumsum(n),]))[
      sequence(n, 1:nrow(dt), length(n))
    ]
  ]
}

Demonstrating:

set.seed(474180891)
system.time(dt <- f(giftsEstimationList))
#>    user  system elapsed 
#>    0.06    0.00    0.06
dim(dt)
#> [1] 6000    6
# show the first 12 rows of the result
dt[1:12]
#>     iter     name  team giftsBrought gifts_received_pct giftsReceived
#>  1:    1    Aaron Sales            5                0.2             3
#>  2:    1    Susie Sales            3                0.3             7
#>  3:    1      Sam Sales            4                0.1             2
#>  4:    1     Emma    IT            2                0.2             4
#>  5:    1 Jennifer    IT            3                0.2             7
#>  6:    1    Steve    IT            6                0.1             0
#>  7:    2    Aaron Sales            5                0.2             4
#>  8:    2    Susie Sales            3                0.3             6
#>  9:    2      Sam Sales            4                0.1             2
#> 10:    2     Emma    IT            2                0.2             7
#> 11:    2 Jennifer    IT            3                0.2             3
#> 12:    2    Steve    IT            6                0.1             1