Predicting baseline cumulative hazard using predict.coxph in r

934 views Asked by At

My aim is to predict (predict cumulative hazard for a new observation from the fitted model below) the cumulative hazard value from the time scale 0 to the start time from the fitted model.

I have fitted the cox model using 2 times (start time which is not equal to 0 and end time). So then I can find the difference between cumulative hazard at the end time(i.e. cumulative hazard from 0 to end time, which I have already calculated using the same fitted model) and the cumulative hazard at the start time (i.e. cumulative hazard from 0 to end time, which I want to calculate here) which will ultimately give the cum haz between start and end time of each observation.

So for getting the expected number of events I've used predict(coxph(), newdata, type= "expected") .

The data I have used is as follows:

N <- 10^4 # population
H <- within(data.frame(start_time=runif(N, 0, 50), x1=rnorm(N, 2, 1), x2=rnorm(N, -2, 1)), {
  lp <-   0.05*x1 + 0.2*x2 
  Tm <- qweibull(runif(N,pweibull(start_time,shape = 7.5, scale = 84*exp(-lp/7.5)),1), shape=7.5, scale=84*exp(-lp/7.5))
  Cens1 <- 100
  event_time <- pmin(Tm,Cens1)
  status <- as.numeric(event_time == Tm)})  

and the code for prediction is:

H$X <- rep(1,nrow(H))
D = coxph(Surv(start_time, event_time, status) ~ X, data =  H, x = TRUE )
pred2 <- predict(D, newdata = data.frame(start_time = rep(0,nrow(H)),event_time = H$start_time, status = rep(0,nrow(H)), X = rep(1, nrow(H))), type = "expected")

But the pred2 only results in "NA" values. Can someone point out whether there is any mistake in my idea or in the code

Please let me know if any more further clarification is required.

2

There are 2 answers

0
Aria On BEST ANSWER

I found the answer myself, it's just a quick trick which I'm not sure will work always. If I use the following line before the predict() function:

D$coefficients["X"] <- 0

But, I am getting proper values which checked using the nelsonaalen() function which doesn't accept start time (or two variable at a time)

Let me know if there's any other proper way to solve it.

3
StupidWolf On

There's two issues. First, you run into an issue because when you specify ~1,which means fitting an intercept only model with no coefficients. so all your predictions will give you one value?

library(survival)
D <- coxph(Surv(H$start_time, H$event_time, H$status) ~ 1, x = TRUE )
names(D)
 [1] "loglik"            "linear.predictors" "method"           
 [4] "residuals"         "n"                 "nevent"           
 [7] "terms"             "assign"            "concordance"      
[10] "x"                 "y"                 "timefix"          
[13] "formula"           "call"  

table(predict(D))

    0 
10000

I don't think that makes a lot of sense, and hence you run into all the errors. So you need to predict with independent variables that are using in the regression for example:

D <- coxph(Surv(start_time,event_time,status) ~ x1+x2, data=H )
pred2 <- predict(D, newdata = data.frame(t_0 = rep(0,nrow(H)),time = H$start_time, event_M = rep(0,nrow(H)), X = rep(1, nrow(H))), type = "expected")

predict(D,newdata=data.frame(x1=runif(10,0,1),x2=runif(10,-1,1)))
        1         2         3         4         5         6         7         8 
0.3033206 0.4213120 0.3952827 0.3879701 0.4798670 0.2170032 0.3385253 0.4141698 
        9        10 
0.3690579 0.4128084 

When you fit a model with all X=1, this gives you all NAs because there is already an intercept, which makes this variable redundant. You can check:

H$X = 1
D <- coxph(Surv(start_time, event_time, status) ~ X,data=H)

Call:
coxph(formula = Surv(start_time, event_time, status) ~ X, data = H)

  coef exp(coef) se(coef)  z  p
X   NA        NA        0 NA NA

It only works when X is an actual variable in the fitted data, so I use an example with 2 covariates:

H$X = runif(nrow(H))
D <- coxph(Surv(start_time, event_time, status) ~ X + x1,data=H)

And you can predict by for example fixing X at 1 and varying x1:

predict(D,newdata=data.frame(X=1,x1=c(0.1,0.2,0.3)))
         1          2          3 
-0.1132548 -0.1084592 -0.1036637 

or X at 2:

predict(D,newdata=data.frame(X=2,x1=c(0.1,0.2,0.3)))
                 1          2          3 
-0.1579480 -0.1531524 -0.1483568