Is there any way to predict survival probability for censored objects after historical dates (prediction in future)?

211 views Asked by At

I am trying to understand the possibilities and limitations of Survival analysis, in particular lifelines python package.

I fitted the Cox Proportional Hazard Model with some rossi data and got survival function showing the survival over historical period, which is clear.

Here is my code:

import pandas as pd
from lifelines.datasets import load_rossi
from lifelines import CoxPHFitter
rossi = load_rossi()
cph1 = CoxPHFitter()
cph1.fit(rossi, duration_col='week', event_col='arrest')
cph1.plot_covariate_groups('race', [0,1])

Survival Function over historical dates

My questions are:

1. Can we somehow predict future survival probabilities of censored objects using lifelines package or any other python library for survival analysis? I mean to make survival function go beyond historical periods (e.g. probability of survival after 60 weeks?)

2. Can we use fitted model to compute survival function for new samples of data given their features values?

Regarding my 1st question I tried this (from lifelines doc):

censored_subjects = rossi.loc[~rossi['arrest'].astype(bool)]
censored_subjects_last_obs = censored_subjects['week']
# predict new survival function
cph1.predict_survival_function(censored_subjects,
                               conditional_after=censored_subjects_last_obs)

But it returns following 49x318 dataframe: Returned dataframe

1

There are 1 answers

0
Derryn Knife On

Yes, you can do this if you use a parametric baseline distribution instead of a non-parametric baseline.

You can do this in surpyval (I am it's lead developer) with a WeibullPH model. "PH" is a proportional hazards model, in the same way as CoxPH, but the Weibull part means there is a baseline Weibull distribution.

df = load_rossi()
x = df["week"]
c = 1 - df["arrest"].astype(int)
Z = df.drop(["week", "arrest"], axis=1).values
model = surv.WeibullPH.fit(Z, x, c=c)
model

This results in:

Parametric Regression SurPyval Model
====================================
Kind                : Proportional Hazard
Distribution        : Weibull
Regression Model    : Log Linear (Exponential)
Fitted by           : MLE
Distribution        :
    alpha: 54.06104249869792
    beta: 1.4036999665486276
Regression Model    :
    beta_0: -0.3820354740362787
    beta_1: -0.0571506013475474
    beta_2: 0.3155487091456667
    beta_3: -0.14957341139264593
    beta_4: -0.43693491104474835
    beta_5: -0.08257869991010232
    beta_6: 0.09238644785400409

You can then use the model to predict survival probabilities above the maximum observation time using the sf method of the model you've created. In this example we use the first set of covariates, i.e. for the person who has the covariates in the first row of the Rossi data.

from matplotlib import pyplot as plt

x_plt = np.arange(0, 100)
y_plt = model.sf(x_plt, Z[0])
plt.plot(x_plt, y_plt)

Weibull Proportional Hazards Plot

The key assumption that is made here is that the underlying data is Weibull distributed. It is this assumption that enables you to "see into the future" with the extrapolation.