I am building a forecast model using AalenAdditiveFitter from Lifelines in Python to predict whether an event will occur or not and when.
T (time) = months C (event) = 1 is yes and 0 is no
In addition I have 8 attributes that I am using.
aaf = AalenAdditiveFitter(coef_penalizer=1., fit_intercept=True)
cx1 = aaf.fit(trainX.drop(['index'], axis=1), duration_col='T', event_col='C',show_progress=True)
I am able to build a relatively stable model and get cumulative hazard probabilities using the following method:
stestXsurvived = cx1.predict_cumulative_hazard(stestX.drop(['T','C'], axis=1))
Is there a way of getting conditional/marginal probabilities straight from AalenAdditiveFitter procedure?
So after doing a little more digging, can I assume the following?
- I get cumulative hazard probabilities from Aalen Additive model
- To get them to conditional probabilities for each individual month, I can just take the difference of prior month: P(t) - P(t-1)
This is based on the answer posted on https://quant.stackexchange.com/questions/21816/cumulative-vs-marginal-probability-of-default
Not sure if this solution is so simple, please help.
If you difference the cumulative hazard in the way you suggest, you will get h(t), the hazard. h(t) does amount to a conditional probability for discrete-time durations. Note, though: for continuous-time durations, h(t) is a rate (it can be larger than 1, for instance).
As an aside: I cannot remember whether Aalen's additive model is semi-parametric offhand. However, if it is, the cumulative hazard will only change in value in the months where we see a failure. It won't impact your (month - previous month) calculation any--the difference will come out to be 0, which is always the case for semi-parametric duration models when we observe no failures.
If you wanted to save computing power, you could take the cumulative hazard at one failure time (call this t_k) and subtract it from the cumulative hazard at the last failure time before this one (call this t_k-1). The answer you get would be the same, once you wrap your mind around what the new quantity's telling you: if the cumulative hazard changes that much between t_k-1 and t, and semi-parametric hazards (and, therefore, the cumulative hazard, too) only updates when we see failures, then any time point falling between t_k-1 and t must have a hazard of 0.