Writing a function for initializing parameters in R/Splus

585 views Asked by At

I'd like to write a function that will create and return a set of parameters to be used in a function mySimulation I've created. Until now, I've basically been doing, e.g., mySimulation(parm1 = 3, parm2 = 4). This is now suboptimal because (1) in the actual version, the number of parameters is becoming unwieldy and (2) I'd like to keep track of different combinations of the parameters that produce the different models I'm using. So, I wrote createParms (a minimally sufficient version shown below) to do the trick. My whole approach just seems so clunky though. With all the statisticians using R, I'm sure there's a more standard way of handling my issue...right?

createParms <- function(model = "default", ...) {
  # Returns a list `parms` of parameters which will then be used in  
  # mySimultation(parms)
  #
  # Args:
  #   model: ["default" | "mymodel"] character string representation of a model 
  #          with known parameters
  #   ...: parameters of the existing `model` to overwrite.
  #        if nothing is supplied then the model parameters will be left as is. 
  #        passed variables must be named.
  #        e.g., `parm1 = 10, parm2 = 20` is good. `10, 20` is bad. 
  #
  # Returns:
  #   parms: a list of parameters to be used in mySimulation(parms)
  #          
  parms.names <- c("parm1", "parm2")
  parms <- vector(mode = "list", length = length(parms.names))
  names(parms) <- parms.names
  overwrite <- list(...)
  overwrite.names <- names(overwrite)
  if (model == "default") {
    parms$parm1 <- 0
    parms$parm2 <- 0
  } else if (model == "mymodel") {
      parms$parm1 <- 1
      parms$parm2 <- 2
  } 
  if (length(overwrite) != 0) {
    parms[overwrite.names] <- overwrite
  }
  return(parms)
}
2

There are 2 answers

1
Richie Cotton On BEST ANSWER

If the simulation function always takes the same set of arguments, then Ramnath's approach of storing them in a data frame is best. For the more general case of variable inputs to mySimulation, you should store each set of inputs in a list – probably using a list of lists for running several simluations.

The idea behind your createParms function looks sound; you can simplify the code a little bit.

createParms <- function(model = "default", ...) 
{
  #default case
  parms <- list(
    parm1 = 0,
    parm2 = 0
  )

  #other special cases
  if(model == "mymodel")
  {
    parms <- within(parms,     
    {
      parm1 <- 1
      parm2 <- 2
    })  
  }

  #overwrite from ...
  dots <- list(...)
  parms[names(dots)] <- dots

  parms
}

Test this with, e.g.,

createParms()
createParms("mymodel")  
createParms("mymodel", parm2 = 3)

do.call may come in handy for running your simulation, as in

do.call(mySimulation, createParms())

EDIT: What do.call does for you

If you have parms <- createParms(), then

do.call(mySimulation, parms)

is the same as

with(parms, mySimulation(parm1, parm2))

The main advantage is that you don't need to spell out each parameter that you are passing into mySimulation (or to modify that function to accept the parameters in list form).

1
Ramnath On

I think if you know the combination of parameters to be used for each model, then it is better to create a data frame of model names and parameters as shown below

# create a data frame with model names and parameters
# NOTE: i am assuming all models have equal number of parameters
# if they are unequal, then store as list of models

model = c('default', 'mymodel');
parm1 = c(0.5, 0.75);
parm2 = c(1, 2);

models.df = data.frame(model, parm1, parm2)

You can now simulate any of the models by passing it as an argument to your mySimulation function. I have used a dummy simulation example, which you can replace with your code.

# function to run simulation based on model name

mySimulation = function(model = 'default'){

  # find row corresponding to model of interest
  mod.row = match(model, models.df$model)

  # extract parameters corresponding to model
  parms   = models.df[mod.row, -1]

  # run dummy simulation of drawing normal random variables
  sim.df  = rnorm(100, mean = parms[,1], sd = parms[,2])
  return(sim.df)

}

If you now want to run all your simulations in one step, you can use the excellent plyr package and invoke

library(plyr)
sim.all = ldply(models.df$model, mySimulation)

If each of your simulations returns unequal number of values then you can use the function llply instead of ldply.

If you provide more information about the return values of your simulation and details on what it does, this code can be easily tweaked to get what you want.

Let me know if this works