Numerical Differences in Survival Probability Estimates between coxph and cph Functions

88 views Asked by At

I am currently working on obtaining survival probabilities from Cox proportional hazard models. Initially, I utilized the coxph function, but for large data, it proved to be slow when predicting at all time points for all different individuals in the data. I tried using the cph function from the rms package which significantly improved the computational speed. However, I noticed slight differences in the predicted probabilities between the two methods. Specifically, at smaller time points, the differences are up to about 5 decimal places, while at higher time points, they can reach up to 2 decimal places. I set parameters to be the same and even obtained similar coefficient estimates.

Here are the codes from both functions:

From the coxph function:

S_t <- coxph(Surv(time = Y, event = d) ~ ns(age)+ns(psa)+tstage+deyo+gs+insurance+income+education+race, ties="breslow", data = data)

S_t_pred.obj <- survfit(S_t, stype=1, newdata=X, se.fit = FALSE)

S_t_pred.sum <- summary(S_t_pred.obj, times=Y, extend = TRUE)$surv

From the cph function:

S_t1 <- cph(Surv(time = Y, event = d) ~ ns(age)+ns(psa)+tstage+deyo+gs+insurance+income+education+race, method="breslow", data = data, surv=T)

S_t_pred.sum1 <- survest(S_t1, stype=1, times=Y, newdata=X, se.fit = FALSE, extend = TRUE)$surv

I would greatly appreciate any insights or guidance on why these differences may be occurring. Thank you for your time.

1

There are 1 answers

2
jared_mamrot On

Looking at the source code for each function (https://github.com/harrelfe/rms/blob/master/R/cph.s / https://github.com/cran/survival/blob/master/R/coxph.R), there are some options you can adjust to get the same output, e.g.

library(survival)
library(survminer)
# install.packages("rms")
library(rms, quietly = TRUE)

fit <- coxph(Surv(time, status) ~ age, data = lung, ties = "breslow")
fit$var
#>              [,1]
#> [1,] 8.460386e-05

fit2 <- cph(Surv(time, status) ~ age, data = lung, method = "breslow")
fit2$var
#>              age
#> age 8.460363e-05

fit3 <- cph(Surv(time, status) ~ age, data = lung, method = "breslow", singular.ok = TRUE, eps = 1e-6)
fit3$var
#>              age
#> age 8.460386e-05

Created on 2023-11-17 with reprex v2.0.2

Does that work with your actual data?