Using predictNLS to create confidence intervals around fitted values in R?

683 views Asked by At

I want to build confidence intervals around a large set of fitted values using predictNLS from the propogate package in R. As an example, I will use the data set they reference in the function description (https://rdrr.io/github/anspiess/propagate/man/predictNLS.html), DNase, and building a model that takes the values conc and density as features:

library(propogate)
library(dplyr)
library(modelr)

DNase <- DNase

modeldna <- DNase %>% group_by(Run) %>% 
  do(run_model = nls(density ~ a * exp(b * conc), 
start = list(a = 1 , b = 0.5), 
data = .)) %>% ungroup()

I then want to give each row the model that it is assigned to so that predictions can be added:

DNApredict <- full_join(as_tibble(DNase), modeldna, by = "Run")

Add in the predictions:

DNApredict <- DNApredict %>% 
  group_by(Run) %>% 
  do(add_predictions(., var = "predicted_density", first(.$run_model)))

And then, I want to add the confidence interval data that predictNLS seems to provide, by giving it that same data and asking it to give a confidence interval for each fitted point in the predicted_density column:

confidence_interval <- predictNLS(model = modeldna, newdata = DNApredict$predicted_density, interval = "confidence")

However, the following error arises:

Error in as.list(object$call$formula) : argument "object" is missing, with no default

Does anyone know what might be causing this? I know that it will likely seem obvious to some of you what the object it is calling is, so I apologize if this is a ridiculous question. I am really hoping to be able to use this function to create confidence intervals around a series of fitted values. Thank you very much in advance.

1

There are 1 answers

1
Allan Cameron On BEST ANSWER

Since you are running an nls on each Run in the sample data set, it is easy to get a list of nls models by splitting each run into its own data frame, and running nls on each data frame using lapply

library(propagate)

DNase <- DNase

modeldna <- DNase %>% split(DNase$Run)

models <- lapply(modeldna, function(d) nls(density ~ a * exp(b * conc), 
                                           start = list(a = 1 , b = 0.5), 
                                           data = d))

Now we can get predictions for each point in each model just as easily by running predictNLS on each model (again inside lapply)

results <- lapply(seq_along(modeldna), function(i) {
 predictNLS(models[[i]], newdata = data.frame(conc = modeldna[[i]]$conc))
})

Because of the output structure of predictNLS, we need to extract the predictions for each row and coerce them into a data frame:

predictions <- lapply(results, function(x) {
  as.data.frame(do.call(rbind, lapply(x$prop, function(y) y$prop)))})

Finally, we can stick our predictions (including confidence intervals) back onto the original data frame:

all_results <- do.call(rbind, lapply(seq_along(modeldna), 
                      function(i) cbind(modeldna[[i]], predictions[[i]])))

This now gives us a complete data frame of original data points, and the relevant predictions with confidence intervals.

To show this, we can plot the results in ggplot. Here we show one plot for each run, including its original data, the predicted value as a dotted line, and the 95% confidence limit as a pale blue ribbon:

library(ggplot2)

ggplot(all_results, aes(x = conc, y = density)) +
  geom_ribbon(aes(ymin = `2.5%`, ymax = `97.5%`), 
              fill = "deepskyblue4", alpha = 0.2) +
  geom_point() +
  geom_line(aes(y = Mean.1), linetype = 2) +
  facet_wrap(.~factor(Run, levels = 1:11)) +
  theme_bw()

enter image description here