R calculate the standard error using bootstrap

22k views Asked by At

I have this array of values:

> df
[1] 2 0 0 2 2 0 0 1 0 1 2 1 0 1 3 0 0 1 1 0 0 0 2 1 2 1 3 1 0 0 0 1 1 2 0 1 3
[38] 1 0 2 1 1 2 2 1 2 2 2 1 1 1 2 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0
[75] 0 0 0 0 0 1 1 0 1 1 1 1 3 1 3 0 1 2 2 1 2 3 1 0 0 1

I want to use package boot to calculate the standard error of the data. http://www.ats.ucla.edu/stat/r/faq/boot.htm

So, I used this command to pursue:

library(boot)
boot(df, mean, R=10)

and I got this error:

Error in mean.default(data, original, ...) : 
'trim' must be numeric of length one

Can someone help me figure out the problem? Thanks

3

There are 3 answers

1
Metrics On BEST ANSWER

If you are bootstrapping the mean you can do as follows:

set.seed(1)
library(boot)
x<-rnorm(100)
meanFunc <- function(x,i){mean(x[i])}
bootMean <- boot(x,meanFunc,100)
>bootMean

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = x, statistic = meanFunc, R = 100)


Bootstrap Statistics :
     original      bias    std. error
t1* 0.1088874 0.002614105  0.07902184

If you just input the mean as an argument you will get the error like the one you got:

bootMean <- boot(x,mean,100)
Error in mean.default(data, original, ...) : 
  'trim' must be numeric of length one
0
John On

The function c is not sufficient for boot. If you'll look at the help for boot then you'll see that your function must be able to receive the data and an index. So, you need to write your own function. Furthermore, it should return the value that you want the standard error of, like the mean.

0
PascalVKooten On

I never really used boot, since I do not understand what it will bring to the table.

Given that the standard error is defined as:

sd(sampled.df) / sqrt(length(df))

I believe you can simply use the following function to get this done:

custom.boot <- function(times, data=df) {
  boots <- rep(NA, times)
  for (i in 1:times) {
    boots[i] <- sd(sample(data, length(data), replace=TRUE))/sqrt(length(data))  
  }
  boots
}

You can then calculate the expected value for yourself (since you get a distribution of some sample realization):

# Mean standard error
mean(custom.boot(times=1000))
[1] 0.08998023

Some years later...

I think this is nicer:

mean(replicate(times, sd(sample(df, replace=T))/sqrt(length(df))))