Apply different start parameters to model using purrr::map within dplyr::mutate

1k views Asked by At

Trying to answer someones question on the ggplot2 mailing list and I can't figure it out: https://groups.google.com/forum/#!topic/ggplot2/YgCqQX8JbPM

OP wants to apply different start parameters to subsets of his data for the nls model. My thoughts were that he should read up on dplyr and purrr, but after a few hours of trying myself I've hit a wall. Unsure if it's a bug or my lack of experience with purrr.

library(tidyverse)

# input dataset
df <- data.frame(Group = c(rep("A", 7), rep("B", 7), rep("C", 7)),
                 Time = c(rep(c(1:7), 3)),
                 Result = c(100, 96.9, 85.1, 62.0, 30.7, 15.2, 9.6, 
                            10.2, 14.8, 32.26, 45.85, 56.25, 70.1, 100,
                            100, 55.61, 3.26, -4.77, -7.21, -3.2, -5.6))

# nest the datasets for computing models
df_p <-
df %>%
group_by(Group) %>%
nest

# add model parameters as rows/columns
df_p$starta = c(-3, 4,-3)
df_p$startb = c(85, 85, 85)
df_p$startc = c(4, 4, 4)
df_p$startd = c(10,10,10)

# compute models using nls
df_p %>%
mutate(model2 = map(data, ~nls(Result ~ a+(b-a)/(1+(Time/c)^d), data = ., start = c(a = starta, b = startb, c = startc, d = startd)))
        )

#Error in mutate_impl(.data, dots) : 
#  parameters without starting value in 'data': a, b, d

Feels related to this bug, but this has been fixed for a while now... https://github.com/hadley/dplyr/issues/1447

From what I can tell, it's looking for the variables in the scope of the nested tibble, but I want it to be in the scope of the mutate call. I don't know if there is a way around this.

2

There are 2 answers

3
jennybryan On BEST ANSWER

The example data is tricky because Group B basically has time in reverse. Finding good initial values for that is not my problem. So I made up new data for Group B. Here's how to set up a data frame in order to apply nls() inside of map2().


library(tidyverse)

df <- data.frame(Group = c(rep("A", 7), rep("B", 7), rep("C", 7)),
                 Time = c(rep(c(1:7), 3)),
                 Result = c(100, 96.9, 85.1, 62.0, 30.7, 15.2, 9.6, 
                            ## I replaced these values!!
                            ## Group B initial values are NOT MY PROBLEM
                            105, 90, 82, 55, 40, 23, 7, 
                            100, 55.61, 3.26, -4.77, -7.21, -3.2, -5.6))

## ggplot(df, aes(x = Time, y = Result, group = Group)) + geom_line()

df_p <-
  df %>%
  group_by(Group) %>%
  nest() %>% 
  ## init vals are all the same, but this shows how to make them different
  mutate(start = list(
    list(a = -3, b = 85, c = 4, d = 10),
    list(a = -3, b = 85, c = 4, d = 10),
    list(a = -3, b = 85, c = 4, d = 10)
  )

)

df_p %>%
  mutate(model2 = map2(data, start,
                       ~ nls(Result ~ a+(b-a)/(1+(Time/c)^d),
                             data = .x, start = .y)))
#> # A tibble: 3 × 4
#>    Group             data      start    model2
#>   <fctr>           <list>     <list>    <list>
#> 1      A <tibble [7 × 2]> <list [4]> <S3: nls>
#> 2      B <tibble [7 × 2]> <list [4]> <S3: nls>
#> 3      C <tibble [7 × 2]> <list [4]> <S3: nls>
0
Psidom On

Not able to find the a set of parameters to produce the models you set up, but I think this is what you can do as far as setting up the model fitting process; Basically you can wrap all the parameters starta, startb .. etc into the data as well as the Result and Time columns, and then you can access the parameters with .$, notice in this case you will need the unique function to pick one value as the value has been broadcasted when unnesting. With a straightforward model formula, a + b*Time, it produces the models in the model2 column, you may follow this route and tweak the initial parameters passed to nls to fit a more complicated formula as you have specified:

library(tidyverse)

df_p %>% unnest %>% group_by(Group) %>% nest %>%
         mutate(model2 = map(data, ~nls(Result ~ a + b*Time, data = ., 
                                        start = c(a = unique(.$starta), 
                                                  b = unique(.$startb))
                                       )
                             )
               )

# A tibble: 3 × 3
#   Group             data    model2
#  <fctr>           <list>    <list>
#1      A <tibble [7 × 6]> <S3: nls>
#2      B <tibble [7 × 6]> <S3: nls>
#3      C <tibble [7 × 6]> <S3: nls>