Using purrr to fit many survival models using tidymodels

168 views Asked by At

I am trying to fit many survival models using tidymodels, workflow, and purr. I can get this approach to work for other models, e.g., linear regression, but not survival. I have loaded the survival extension to parsnip.

Here is code to

  1. Generate a small dataset.
  2. Demonstrate that usual cox-ph works fine.
  3. Run linear regression with tidymodels and workflows and it works fine (models1).
  4. However, models2 and models3 result in error message: Error in `fit_xy()': !Models for censored regression must use the formula interface.

This message does not help me much. I suppose it has something to do with the in-line Surv() object but haven't been able to see how to construct this another way.

I realize that parsnip for survival may still be work-in-progress, but it is unclear to me if this has been implemented and should work or not. Appreciate any help, Thanks.

library(tidyverse)
library(tidymodels)
library(survival)
library(censored)

set.seed(1973)
df <- tibble(id = seq(1:1000))
df <- df %>% mutate( survtime = floor(100*rgamma(n=1000, shape =1 , rate=1)) ,
                     fail = runif(n=1000) >0.33 ,
                      a1 = runif(n=1000) >0.1,
                     a2 = runif(n=1000) >0.5)
                     

head(df)

cox1 <- coxph(data = df, Surv(survtime, event=fail) ~a1)
summary(cox1)

cox2 <- coxph(data = df, Surv(survtime, event=fail) ~a2)
summary(cox2)

A <- c("a1", "a2")
models1 <- map(A,
               ~workflow() %>%
                 add_model(linear_reg()) %>%
                 add_formula(reformulate(.x, response = 'survtime')) %>%
                 fit(df)
)
models1

#not working
models2 <- map(A,
               ~workflow() %>%
                 add_model(proportional_hazards()) %>%
                 add_formula(reformulate(.x, response = 'Surv(survtime, event=fail)')) %>%
                 fit(df)
)


#not working
models3 <- map(A,
               ~workflow() %>%
                 add_model(proportional_hazards()) %>%
                 add_formula(as.formula(paste0('Surv(survtime, event=fail) ~ ', .x))) %>%
                 fit(df)
)
models3

EDITED to change variables X, x1, and x2 to A, a1, and a2 for clarity. Error message is the same:

Error in `fit_xy()': !Models for censored regression must use the formula interface.

However, problem may be version of R or packages. Error happens with version 4.1.2 with survival 3.4-0 parsnip 1.0.1

but code works on another system with R 4.2.2 parsnip 1.0.3 survival 3.3-1

Unfortunately, I am at sysadmin's mercy for updates so I did not check versioning and cannot easily troubleshoot.

1

There are 1 answers

1
topepo On

Uppercase "X" instead of lower case "x" is the issue.

Your not the first to do this (I stared at it for a few minutes to see it) so don't feel bad.

However... this example is exactly why we have the reprex package. That would have told you the issue right away since it starts with a fresh session (and would show us the error message)

Knowing's half the battle :-)

library(tidyverse)
library(tidymodels)
library(survival)
library(censored)

set.seed(1973)
df <- tibble(id = seq(1:1000))
df <- df %>% mutate( survtime = floor(100*rgamma(n=1000, shape =1 , rate=1)) ,
                     fail = runif(n=1000) >0.33 ,
                      x1 = runif(n=1000) >0.1,
                     x2 = runif(n=1000) >0.5)
                     

head(df)
#> # A tibble: 6 × 5
#>      id survtime fail  x1    x2   
#>   <int>    <dbl> <lgl> <lgl> <lgl>
#> 1     1       27 FALSE TRUE  FALSE
#> 2     2       77 FALSE TRUE  TRUE 
#> 3     3      180 FALSE TRUE  FALSE
#> 4     4      127 FALSE TRUE  TRUE 
#> 5     5       69 TRUE  TRUE  TRUE 
#> 6     6       20 FALSE TRUE  TRUE

cox1 <- coxph(data = df, Surv(survtime, event=fail) ~x1)
summary(cox1)
#> Call:
#> coxph(formula = Surv(survtime, event = fail) ~ x1, data = df)
#> 
#>   n= 1000, number of events= 692 
#> 
#>          coef exp(coef) se(coef)     z Pr(>|z|)  
#> x1TRUE 0.2854    1.3303   0.1250 2.284   0.0224 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>        exp(coef) exp(-coef) lower .95 upper .95
#> x1TRUE      1.33     0.7517     1.041     1.699
#> 
#> Concordance= 0.509  (se = 0.007 )
#> Likelihood ratio test= 5.62  on 1 df,   p=0.02
#> Wald test            = 5.22  on 1 df,   p=0.02
#> Score (logrank) test = 5.25  on 1 df,   p=0.02

cox2 <- coxph(data = df, Surv(survtime, event=fail) ~x2)
summary(cox2)
#> Call:
#> coxph(formula = Surv(survtime, event = fail) ~ x2, data = df)
#> 
#>   n= 1000, number of events= 692 
#> 
#>           coef exp(coef) se(coef)     z Pr(>|z|)  
#> x2TRUE 0.13210   1.14122  0.07655 1.726   0.0844 .
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>        exp(coef) exp(-coef) lower .95 upper .95
#> x2TRUE     1.141     0.8763    0.9822     1.326
#> 
#> Concordance= 0.513  (se = 0.011 )
#> Likelihood ratio test= 2.98  on 1 df,   p=0.08
#> Wald test            = 2.98  on 1 df,   p=0.08
#> Score (logrank) test = 2.98  on 1 df,   p=0.08

X <- c("x1", "x2")
models1 <- map(X,
               ~workflow() %>%
                 add_model(linear_reg()) %>%
                 add_formula(reformulate(.x, response = 'survtime')) %>%
                 fit(df)
)
models1
#> [[1]]
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: linear_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> survtime ~ x1
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> 
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#> 
#> Coefficients:
#> (Intercept)       x1TRUE  
#>      121.24       -23.58  
#> 
#> 
#> [[2]]
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: linear_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> survtime ~ x2
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> 
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#> 
#> Coefficients:
#> (Intercept)       x2TRUE  
#>     104.055       -7.706

#not working
models2 <- map(x,
               ~workflow() %>%
                 add_model(proportional_hazards()) %>%
                 add_formula(reformulate(.x, response = 'Surv(survtime, event=fail)')) %>%
                 fit(df)
)
#> Error in vctrs_vec_compat(.x, .purrr_user_env): object 'x' not found


#not working
models3 <- map(x,
               ~workflow() %>%
                 add_model(proportional_hazards()) %>%
                 add_formula(as.formula(paste0('Surv(survtime, event=fail) ~ ', .x))) %>%
                 fit(df)
)
#> Error in vctrs_vec_compat(.x, .purrr_user_env): object 'x' not found
models3
#> Error in eval(expr, envir, enclos): object 'models3' not found

Created on 2023-01-06 with reprex v2.0.2