I have trouble implementing a function in the boot
library.
The function I want to implement is the following
fsyn = function(x) sum( x[1,] == x [2,] )
It's a count of the matches between two sequences.
My data is a set of sequences such as
id V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 1 c a c b c c b d d a
2 1 c d a a c b d a b a
3 2 b d c b b b c d a b
4 2 b a b c b c d b a d
Something important about these sequences is the fact that they are paired by id
.
I am interested in doing two things, the first is to bootstrap the number of matches by id
and the second for two random individuals.
The first procedure can be implemented by
library(dplyr)
chid = df$id
# sampling paired sequences #
wchid = function(chid) which(chid %in% sample(chid, 1))
# the matches function #
fsyn = function(x) sum( x[1,] == x [2,] )
# wrapping the function #
funcHamC = function(df) df[wchid(chid), -1] %>% fsyn
with
df %>% funcHamC
The second function can simply be written like
funcHamR = function(df) df[sample(df$id, 2), -1] %>% fsyn
df %>% funcHamR
However, I have issues using these two functions with boot
.
library(boot)
boot(df, funcHamC, R = 10)
boot(df, funcHamR, R = 10)
This isn't working. Any idea ?
data
df = as.data.frame( t(replicate(20, sample(letters[1:4], 10, T))) )
df$id = rep(1:10, 2)
df = df %>% select(id, everything()) %>% arrange(id)
The
boot
function expects two arguments to thestatistic
function—the second being a parameter to specify the sample values to choose. Because you are using your own methods to choose randomly from the data, you should set thesim
argument to'parametric'
. This will use theran.gen
argument to specify a function to generate random values from the data.To quote from the help file: "If
ran.gen
is not specified, the default is a function which returns the originaldata
in which case all simulation should be included as part ofstatistic
."Save the boot output to variables—such as
C.boot
andR.boot
and you will find the samples inC.boot$t
andR.boot$t
.You can then obtain the stats you need from the generated values.