Sampling a matrix with conditions (no zeros or repeated columns)

291 views Asked by At

In case you are interested in the background of the question, I'm thinking how to solve this post- incidentally, if you solve it there, I'll just erase this question. Ideally, I'd like to get an analytical or algebraic solution (constrained non-capturing rook problem), but short of that I'd like a simulation. Incidentally, I posted a related question without as much detail, in case it is easier to tackle.


But you don't have to leave this page. Basically there are pairings of two lists of soccer teams, and some pairings are good, while others are forbidden by the rules. This is the matrix:

enter image description here

So to generate multiple samplings to match the teams on the row names (to the left) with the column names of opposing teams (at the top), I have to come up with a conditional sampling procedure, but I have no clue how to.

This is what I have attempted so far:

BCN = c(0,2,3,4,0,0,7,8)
ATL = c(0,0,3,4,5,0,7,8)
DOR = c(0,0,3,4,5,6,7,0)
MON = c(1,2,3,0,5,6,7,0)
ARS = c(1,2,3,0,0,6,7,8)
LEI = c(1,2,3,4,0,6,0,8)
JUV = c(1,2,3,4,5,0,7,8)
NAP = c(1,2,0,4,5,6,7,8)

chessboard = t(as.matrix(data.frame(BCN, ATL, DOR, MON, ARS, LEI, JUV, NAP)))
colnames(chessboard) = c("MAD", "BYN", "BEN", "PSG", "MCY", "SEV", "OPO", "LEV")
chessboard
        MAD BYN BEN PSG MCY SEV OPO LEV
  BCN   0   2   3   4   0   0   7   8
  ATL   0   0   3   4   5   0   7   8
  DOR   0   0   3   4   5   6   7   0
  MON   1   2   3   0   5   6   7   0
  ARS   1   2   3   0   0   6   7   8
  LEI   1   2   3   4   0   6   0   8
  JUV   1   2   3   4   5   0   7   8
  NAP   1   2   0   4   5   6   7   8

match = function(){
vec = rep(0,8)
for(i in 1:8){
tryCatch({vec[i] = as.numeric(sample(as.character(chessboard[i,][!(chessboard[i,] %in% vec) & chessboard[i,] > 0]),1))
last=chessboard[8,][!(chessboard[8,] %in% vec) & chessboard[i,] > 0]
},error=function(e){})
}
vec
}
match()

set.seed(0)
nsim = 100000
matches = t(replicate(nsim, match()))
matches = subset(matches, matches[,8]!=0)
colnames(matches) = c("BCN", "ATL", "DOR", "MON", "ARS", "LEI", "JUV", "NAP")
head(matches)

table = apply(matches, 2, function(x) table(x)/nrow(matches))
table
$BCN
x
        2         3         4         7         8 
0.1969821 0.2125814 0.1967272 0.1967166 0.1969927 

$ATL
x
        3         4         5         7         8 
0.2016226 0.1874462 0.2357732 0.1875737 0.1875843 

$DOR
x
        3         4         5         6         7 
0.1773264 0.1686188 0.2097673 0.2787270 0.1655605 

$MON
x
        1         2         3         5         6         7 
0.2567882 0.2031199 0.1172017 0.1341921 0.1789617 0.1097365 

$ARS
x
        1         2         3         6         7         8 
0.2368882 0.1907169 0.1104480 0.1651358 0.1026112 0.1941999 

$LEI
x
        1         2         3         4         6         8 
0.2129743 0.1717302 0.1019210 0.1856410 0.1511081 0.1766255 

$JUV
x
         1          2          3          4          5          7          8 
0.15873252 0.12940289 0.07889902 0.14203948 0.22837179 0.12845781 0.13409648 

$NAP
x
        1         2         4         5         6         7         8 
0.1346168 0.1080481 0.1195272 0.1918956 0.2260675 0.1093436 0.1105011 
1

There are 1 answers

1
sirallen On

Maybe try this:

matches = setNames(as.list(rep(NA,8)), rownames(mat))
set.seed(1)    

# For each row, sample a column, then drop that column.
# 'sample.int' will automatically renormalize the probabilities.
for (i in sample.int(8)) {
  team_i = rownames(mat)[i]

  j = sample.int(ncol(mat), 1, prob=mat[i,])

  matches[[team_i]] = colnames(mat)[j]

  mat = mat[,-j,drop=FALSE]
}

> matches
# $Barcelona
# [1] "Oporto"
# 
# $Atletico
# [1] "Benfica"
# 
# $Dortmund
# [1] "Paris"
# 
# $Juventus
# [1] "City"
# 
# $Arsenal
# [1] "Sevilla"
# 
# $Napoli
# [1] "Leverkusen"
# 
# $Monaco
# [1] "Bayern"
# 
# $Leicester
# [1] "Madrid"

Might be a good idea to add restrictions so you don't end up with a row of zeros.