I am trying to understand the possibilities and limitations of Survival analysis, in particular lifelines
python package.
I fitted the Cox Proportional Hazard Model with some rossi data and got survival function showing the survival over historical period, which is clear.
Here is my code:
import pandas as pd
from lifelines.datasets import load_rossi
from lifelines import CoxPHFitter
rossi = load_rossi()
cph1 = CoxPHFitter()
cph1.fit(rossi, duration_col='week', event_col='arrest')
cph1.plot_covariate_groups('race', [0,1])
My questions are:
1. Can we somehow predict future survival probabilities of censored objects using lifelines package or any other python library for survival analysis? I mean to make survival function go beyond historical periods (e.g. probability of survival after 60 weeks?)
2. Can we use fitted model to compute survival function for new samples of data given their features values?
Regarding my 1st question I tried this (from lifelines doc):
censored_subjects = rossi.loc[~rossi['arrest'].astype(bool)]
censored_subjects_last_obs = censored_subjects['week']
# predict new survival function
cph1.predict_survival_function(censored_subjects,
conditional_after=censored_subjects_last_obs)
Yes, you can do this if you use a parametric baseline distribution instead of a non-parametric baseline.
You can do this in
surpyval
(I am it's lead developer) with aWeibullPH
model. "PH" is a proportional hazards model, in the same way as CoxPH, but the Weibull part means there is a baseline Weibull distribution.This results in:
You can then use the model to predict survival probabilities above the maximum observation time using the
sf
method of the model you've created. In this example we use the first set of covariates, i.e. for the person who has the covariates in the first row of the Rossi data.The key assumption that is made here is that the underlying data is Weibull distributed. It is this assumption that enables you to "see into the future" with the extrapolation.