From the Bernoulli(p), I want to calculate the coverage probability for various sample sizes (n= 10, 15, 20, 25, 30, 50, 100, 150, 200), and for each sample size at p = 0.01, 0.4, and 0.8.
this is my attempt but shows 0 everywhere except for p=0.01
f3 <- function(n,probs) {
res1 <- lapply(n, function(i) {
setNames(lapply(probs, function(p) {
m<-10000
n<-i
p<-p
x <- rbinom(m,size=1,p=p)
p.hat <- x/n
lower.Wald <- p.hat - 1.96 * sqrt(p.hat*(1-p.hat)/n)
upper.Wald <- p.hat + 1.96 * sqrt(p.hat*(1-p.hat)/n)
p.in.CI <- (lower.Wald <p) & ( p < upper.Wald )
covprob1<- mean(p.in.CI)
covprob1
}),paste0("p=",probs))
})
names(res1) <- paste0("n=",n)
res1
}
f3(n=c(10,15,20,25,30,50,100,150,200),probs = c(0.01,0.4, 0.8))
Background
The code in the question attempts to run Monte Carlo simulations on Bernoulli trials to calculate coverage percentages using Wald confidence intervals. One of the problems in the code is that a number of the calculations are executed on individual observations rather than sums of successes and failures. R is primarily a vector processor and the code does not aggregate the individual observations to counts of successes and failures in order to calculate the Wald confidence intervals.
This causes the code to always generate 0 for coverage percentage for p values above 0.01 for sample sizes tested in the original post. We use code from the original post to isolate where error is introduced into the algorithm.
We set a seed, assign values to
m
,n
, andp
, and attempt to generate 10,000 Bernoulli trials of sizen
.At this point
x
is a vector containing 10,000 true = 1, false = 0 values.However,
x
is NOT 10,000 runs of samples of 5 Bernoulli trials. Given this fact, all subsequent processing by the algorithm in the original code will be incorrect.The next line of code calculates a value for
p.hat
. This should be a single value aggregated across the 5 elements in a sample, not a vector of 10,000 elements unless each element in x represents a 5 element sample.An accurate calculation for
p.hat
, treating the vector as one sample would be the following:...which is very close to the population p-value of 0.01 that we assigned earlier in the code, but still does not represent 10,000 trials of sample size 5. Instead,
p.hat
as defined above represents one Bernoulli trial with sample size 10,000.Two minor changes to fix the code
After independently developing a Monte Carlo simulator for Bernoulli trials (see below for details), it becomes clear that with a couple of tweaks we can remediate the code from the original post to make it produce valid results.
First, we multiply
m
byn
in the first argument torbinom()
, so the number of trials produced is 10,000 times sample size. We also cast the result as a matrix with 10,000 rows andn
columns.Second, we use
rowSums()
to sum the trials to counts of successes, and divide the resulting vector of 10,000 elements byn
, producing correct values forp.hat
, given sample size. Oncep.hat
is corrected, the rest of the code works as originally intended....and the output:
These results look more like what we expect from the simulation: poor coverage at low values of p / small sample sizes, where for a given p value coverage improves as sample size increases.
Starting from scratch: a basic simulator for one p-value / sample size
Here we develop a solution that iteratively builds on a set of basic building blocks: one p-value, one sample size, and a 95% confidence interval. The simulator also tracks parameters so we can combine results from multiple simulations into data frames that are easy to read and interpret.
First, we create a simulator that tests 10,000 samples of size drawn from a Bernoulli distribution with a given probability value. It aggregates successes and failures, and then calculates Wald confidence intervals, and generates an output data frame. For the purposes of the simulation, the p-values we pass to the simulator represent the the "true" population probability value. We will see how frequently the simulations include the population p-value in their confidence intervals.
We set parameters to represent a true population p-value of 0.5, a sample size of 5, and z-value of 1.96 representing a 95% confidence interval. We created function arguments for these constants so we can vary them in subsequent code. We also use
set.seed()
to make the results reproducible.A key difference between this code and the code from the original question is that we take samples of 5 from
rbinom()
and immediately sum the number of true values to calculate the number of successes. This allows us to calculateobserved_p
assuccesses / sample_size
. Now we have an empirically generated version of what was calledp.hat
in the original question.The resulting list includes a data frame summarizing the results of each trial.
We combine the list of data frames into a single data frame with
do.call()
At this point
simulation_df
is a data frame containing 10,000 rows and 6 columns. Each row represents the results from one simulation ofsample_size
Bernoulli trials. We'll print the first few rows to illustrate the contents of the data frame.Notice how the
observed_p
values are distinct values in increments of 0.2. This is because when sample size is 5, the number of TRUE values in each sample can vary between 0 and 5. A histogram ofobserved_p
makes this clear.Even with a sample size of 5, we can see the shape of a binomial distribution emerging in the histogram.
Next, we calculate the coverage percentage by summing the rows where the population p-value (represented as
p_value
) is within the Wald confidence interval.A coverage of 93.54% is a reasonable simulation, given that we calculated a 95% confidence interval. We interpret this as 93.5% of the samples generated Wald confidence intervals that included the population p-value of 0.5.
Therefore, we conclude that our simulator appears to be generating valid results. We will build on this basic design to execute simulations with multiple p-values and sample sizes.
Simulating multiple p-values for a given sample size
Next, we'll vary the probability values to see the percentage coverage for 10,000 samples of 5 observations. Since the statistics literature such as Sauro and Lewis, 2005 tells us that Wald confidence intervals have poor coverage for very low and very high p-values, we've added an argument to calculate Adjusted Wald scores. We'll set this argument to
FALSE
for the time being.The resulting list,
p_val_simulations
contains one data frame for each p-value run through the simulation.We combine these data frames and calculate coverage percentages as follows.
As expected, the coverage is very poor the further we are away from a p-value of 0.5.
However, when we rerun the simulation with
adjWald = TRUE
, we get the following results.These are much better, particularly for p-values close the the ends of the distribution.
The final task remaining is to modify the code so it executes Monte Carlo simulations at varying levels of sample size. Before proceeding further, we calculate the runtime for the code we've developed thus far.
system.time()
tells us that the code to run 5 different Monte Carlo simulations of 10,000 Bernoulli trials with sample size of 5 takes about 38 seconds to run on a MacBook Pro 15 with a 2.5 Ghz Intel i-7 processor. Therefore, we expect that the next simulation will take multiple minutes to run.Varying p-value and sample size
We add another level of
lapply()
to account for varying the sample size. We have also set theadjWald
parameter toFALSE
so we can see how the base Wald confidence interval behaves at p = 0.01 and 0.10.Elapsed time on the MacBook Pro was 217.47 seconds, or about 3.6 minutes. Given that we ran 27 different Monte Carlo simulations, the code completed one simulation each 8.05 seconds.
The final step is to process the list of lists to create an output data frame that summarizes the analysis. We aggregate the content, combine rows into data frames, then bind the resulting list of data frames.
One last step, we sort the data by p-value to see how coverage improves as sample size increases.
...and the output:
Interpreting the results
The Monte Carlo simulations illustrate that Wald confidence intervals provide poor coverage at a p-value of 0.01, even with a sample size of 200. Coverage improves at p-value of 0.10, where all but one of the simulations at sample sizes 25 and above exceeded 90%. Coverage is even better for the p-value of 0.80, where all but one of the sample sizes above 15 exceeded 91% coverage.
Coverage improves further when we calculate Adjusted Wald confidence intervals, especially at lower p-values.
The Adjusted Wald confidence intervals provide consistently better coverage across the range of p-values and sample sizes, with an average coverage of 96.72% across the 27 simulations. This is consistent with the literature that indicates Adjusted Wald confidence intervals are more conservative than unadjusted Wald confidence intervals.
At this point we have a working Monte Carlo simulator that produces valid results for multiple p-values and sample sizes. We can now review the code to find opportunities to optimize its performance.
Optimizing the solution
Following the old programming aphorism of Make it work, make it right, make it fast, working the solution out in an iterative manner helped enabled me to develop a solution that produces valid results.
Understanding of how to make it right enabled me not only to see the flaw in the code posted in the question, but it also enabled me to envision a solution. That solution, using
rbinom()
once with an argument ofm * n
, casting the result as amatrix()
, and then usingrowSums()
to calculate p-values, led me to see how I could optimize my own solution by eliminating thousands ofrbinom()
calls from each simulation.Refactoring for performance
We create a function,
binomialSimulation()
, that generates Bernoulli trials and Wald confidence intervals with a single call torbinom()
, regardless of the number of trials in a single simulation. We also aggregate results so each simulation generates a data frame containing one row describing the results of the test.We run the function with a population p-value of 0.5, a sample size of 5, and 10,000 trials and a confidence interval of 95%, and track the execution time with
system.time()
. The optimized function is 99.8% faster than the original implementation described earlier in the article, which runs in about 6.09 seconds.We will skip the intermediate steps and present the optimized version of the iteratively developed solution.
As expected, elimination of the thousands of unnecessary calls to
rbinom()
radically improves performance of the solution.Given that our prior solution ran in 217 seconds, performance of the optimized version is really impressive. Now we have a solution that not only generates accurate Monte Carlo simulations of Bernoulli trials, but it's also fast.