How do I extract parameters from listed models in R?

1.5k views Asked by At

I am trying to extract the parameters of a model I have designed for my work, but am having trouble doing so. I will use the iris data provided with R as an example data set.

I pull in the data:

library(dplyr)
df <- iris

I design a model that takes Sepal Width and Length, and determines what the best model fit given Sepal Length is to determine Sepal Width (this is probably a bad example of a data set, and I didn't pay much attention to the starting parameters, but I am almost certain this shouldn't affect how hard it is to answer this question). One model is produced for each Species:

sepalmodel <- df %>% 
  group_by(Species) %>% 
  do(model = nls(Sepal.Width ~ a*exp(Sepal.Length*b), start = list(a = 0.5, b = 0.9), data = .)) %>% 
  ungroup()

This produces a dataframe with one column being Species, and the other being model. In the model column, each model is represented in each row by a large list, which I have trouble extracting its basic data from. I would like to work with the parameters directly, as a more clean and reproducible way of representing and generating data with these models. How can I extract the a and b parameters from these models? Is there a function that does so, or do I need to dig into the code of each model? Also, is there any data built into that list that associates it with the specific species, or is it only directly associated with the species by nature of being found in the same row?

3

There are 3 answers

1
nniloc On BEST ANSWER

One option is to use broom::tidy with tidyr::unnest.

library(tidyverse)

iris %>% 
  group_by(Species) %>% 
  do(model = nls(Sepal.Width ~ a*exp(Sepal.Length*b), start = list(a = 0.5, b = 0.9), data = .)) %>% 
  ungroup() %>%
  mutate(tidy_nls = lapply(model, broom::tidy)) %>%
  unnest(tidy_nls)

#------
# A tibble: 6 x 7
  Species    model  term  estimate std.error statistic  p.value
  <fct>      <list> <chr>    <dbl>     <dbl>     <dbl>    <dbl>
1 setosa     <nls>  a       1.07      0.162       6.58 3.23e- 8
2 setosa     <nls>  b       0.232     0.0299      7.76 5.12e-10
3 versicolor <nls>  a       1.41      0.228       6.18 1.31e- 7
4 versicolor <nls>  b       0.113     0.0269      4.21 1.11e- 4
5 virginica  <nls>  a       1.80      0.261       6.87 1.17e- 8
6 virginica  <nls>  b       0.0764    0.0218      3.51 9.97e- 4
0
G. Grothendieck On

Here are two approaches.

1) nlme This package comes with R so it does not have to be installed. The formula can be specified as usual followed by a vertical bar and the grouping variable allowing all models to be specified at once.

library(nlme)
fm <- nlsList(Sepal.Width ~ a * exp(Sepal.Length*b) | Species, data = iris, 
  start = list(a = 0.5, b = 0.5))
co <- coef(summary(fm))
co

giving this array so that, for example, co["setosa",,], co[, "Estimate", ], co[,,"a"] give matrices for setosa, for all estimates and for all "a" coefficients. The entire array is:

    , , a

               Estimate Std. Error  t value     Pr(>|t|)
    setosa     1.068204  0.1729358 6.176882 3.226364e-08
    versicolor 1.412344  0.2302122 6.134967 1.314458e-07
    virginica  1.795394  0.2453925 7.316418 1.170751e-08
    
    , , b
    
                 Estimate Std. Error  t value     Pr(>|t|)
    setosa     0.23226126 0.03189911 7.281121 5.122595e-10
    versicolor 0.11319887 0.02708865 4.178831 1.107640e-04
    virginica  0.07643349 0.02046418 3.734989 9.965872e-04

ftable can be used to view this array in various 2d ways. Also this link provides a function ftable2df that can convert an ftable to a data frame. For an example of using ftable:

ftable(co, row.vars = 3:2)

giving:

                    setosa   versicolor    virginica
                                                    
a Estimate    1.068204e+00 1.412344e+00 1.795394e+00
  Std. Error  1.729358e-01 2.302122e-01 2.453925e-01
  t value     6.176882e+00 6.134967e+00 7.316418e+00
  Pr(>|t|)    3.226364e-08 1.314458e-07 1.170751e-08
b Estimate    2.322613e-01 1.131989e-01 7.643349e-02
  Std. Error  3.189911e-02 2.708865e-02 2.046418e-02
  t value     7.281121e+00 4.178831e+00 3.734989e+00
  Pr(>|t|)    5.122595e-10 1.107640e-04 9.965872e-04

and as another example

ftable(co, row.vars = c(1, 3))

giving:

                  Estimate   Std. Error      t value     Pr(>|t|)
                                                                 
setosa     a  1.068204e+00 1.729358e-01 6.176882e+00 3.226364e-08
           b  2.322613e-01 3.189911e-02 7.281121e+00 5.122595e-10
versicolor a  1.412344e+00 2.302122e-01 6.134967e+00 1.314458e-07
           b  1.131989e-01 2.708865e-02 4.178831e+00 1.107640e-04
virginica  a  1.795394e+00 2.453925e-01 7.316418e+00 1.170751e-08
           b  7.643349e-02 2.046418e-02 3.734989e+00 9.965872e-04

Also fm and summary(fm) are S3 objects, i.e. lists with a class variable. The components of the lists are available and can be viewed via:

str(fm)
str(summary(fm))

fm has class vector c("nlsList", "lmList") so you can find all the methods that are available for acting on fm like this:

methods(class = "nlsList")
methods(class = "lmList")

In fact fm has one component for each level of the grouping variable and each of those components is an nls object and we can get the methods that can be applied to nls objects like this:

methods(class = "nls")

Similarly summary(fm) has class vector c("summary.nlsList", "summary.lmList") and so we can find the methods that can be applied to it using:

methods(class = "summary.nlsList")
methods(class = "summary.lmList")

2) nls Using plain nls we can create a single model by specifying the grouping variable as an index on each parameter. The starting values is a list containing a vector component for each parameter having an entry for each level of the grouping vector. Species has 3 levels so we have 3 starting values for a and 3 for b.

st <- list(a = rep(0.5, 3), b = rep(0.5, 3))
fm <- nls(Sepal.Width ~ a[Species] * exp(Sepal.Length*b[Species]), iris, start = st)
co <- coef(summary(fm))
co

giving this matrix:

     Estimate Std. Error  t value     Pr(>|t|)
a1 1.06820417 0.17293582 6.176882 6.329948e-09
a2 1.41233648 0.23021095 6.134967 7.804013e-09
a3 1.79539324 0.24539239 7.316418 1.638244e-11
b1 0.23226125 0.03189911 7.281121 1.983862e-11
b2 0.11319979 0.02708865 4.178864 5.058757e-05
b3 0.07643355 0.02046418 3.734992 2.697815e-04

You could consider changing the row names to:

rownames(co) <- c(t(outer(c("a", "b"), levels(iris$Species), paste, sep = ".")))
co

giving:

               Estimate Std. Error  t value     Pr(>|t|)
a.setosa     1.06820417 0.17293582 6.176882 6.329948e-09
a.versicolor 1.41233648 0.23021095 6.134967 7.804013e-09
a.virginica  1.79539324 0.24539239 7.316418 1.638244e-11
b.setosa     0.23226125 0.03189911 7.281121 1.983862e-11
b.versicolor 0.11319979 0.02708865 4.178864 5.058757e-05
b.virginica  0.07643355 0.02046418 3.734992 2.697815e-04

As in (1) we can inspect the components of fm using str(fm) and can find the methods that can act on fm using methods(class = "nls") .

0
thehand0 On

More generally, if you want to extract parameters from a model, you can do so in a similar manner to selecting columns from a dataframe. For example, in the case of nls you can type ?nls to check what you can return from the model (look under values). For example, nls$model to get the model name, nls$weights to get the weights and so on. Notably, the same approach can be used for correlations. You can type ?cor.test (and look under values) to see the pAramters you can extract. cor.test(x,y)$estimate