Faster alternative to populating a pre-allocated data frame using a for-loop

124 views Asked by At

I am running a few different Monte Carlo simulations, all of which involve generating some data, fitting a model, and capturing several output variables from the fit of the model. Typically data are generated so that several characteristics vary (e.g., number of items, sample size), and models are fit so that several other characteristics vary (e.g., estimation method, model misspecification). I have no questions about generating the data or about how to actually fit the model. However, I know my method for populating my results data frame is very inefficient and I would like some help in improving this. My usual method is as follows:

1) Create data frame with as many rows as models I have to fit (e.g., number of iterations * different item lengths * different sample size lengths * different estimation methods * different types of model misspecification), and as many columns as I need to contain the identifying variables (sample size, number of items, etc.) and to capture all the output.

2) Use a for-loop to identify the particular combination of conditions and the particular iteration of said combination, fit the model, and populate the appropriate row of the data frame.

So I might start with something that looks like:

> head(df)
    fit.model n.sample n.item distr.cond estim iteration df.chisq obt.chisq
1           1      100      3          1    ML         1       NA        NA
1.1         1      100      3          1    ML         2       NA        NA
1.2         1      100      3          1    ML         3       NA        NA
1.3         1      100      3          1    ML         4       NA        NA
1.4         1      100      3          1    ML         5       NA        NA
1.5         1      100      3          1    ML         6       NA        NA

where the last two columns capture results and need to be filled in, and the first six columns are necessary to identify each row uniquely. I then use a for-loop to go row-by-row, pick out the identifying characteristics of that iteration (which allows me to locate the appropriate data file and read it in, as well as to specify how to fit the model), do the model fitting, and then write to the NA columns the output desired. Then I just fill in the remaining NAs with the obtained values, for instance using df$obt.chisq[i] <- fitMeasures(fit,"chisq"), where the function fitMeasures extracts the particular value from the resulting fitted model fit.

Is it possible to vectorize this? I forget the terminology, but I recognize that in this case each iteration is completely independent of each other iteration, so that the particular order doesn't matter. It's time for a change in approach! Any help would be much appreciated.

0

There are 0 answers