How to calculate baseline hazard in R?

154 views Asked by At

I have a Cox regression model as follows:


library(dplyr)
library(survival)
library(eha)

df <- data.frame(id = c("a", "a", "a", "b", "b", "c", "c", "c"),
                 sex = c("0", "0", "0", "1", "1", "0", "0", "0"),
                 start = c(0, 1, 5, 0, 7, 0, 2, 10),
                 stop = c(1, 5, 10, 7, 18, 2, 10, 19), 
                 status = c(1, 1, 0, 1, 0, 1, 1, 0))

  id sex start stop status
1  a   0     0    1      1
2  a   0     1    5      1
3  a   0     5   10      0
4  b   1     0    7      1
5  b   1     7   18      0
6  c   0     0    2      1
7  c   0     2   10      1
8  c   0    10   19      0

# Sex violates proportional hazards
logrank(Surv(start, stop, status), group = sex, data = df)

# Regression
fit <- coxph(Surv(start, stop, status) ~ strata(sex) + cluster(id), data = df)

cluster(id) represents the implementation of the Anderson-Gill model for recurrent events. I want to estimate the (non-cumulative) hazard function from this. I know I can obtain the cumulative baseline hazard using:

# Cumulative hazard
basehaz(fit)

But how to obtain the non-cumulative hazard? Any help is appreciated.

1

There are 1 answers

2
IRTFM On BEST ANSWER

The basehaz function will return results for stratified models. Essentially it breaks the analysis into stratum-specific hazards:

df <- data.frame(id = c("a", "a", "a", "b", "b", "c", "c", "c"),
                  sex = c("0", "0", "0", "1", "1", "0", "0", "0"),
                  start = c(0, 1, 5, 0, 7, 0, 2, 10),
                  stop = c(1, 5, 10, 7, 18, 2, 10, 19), 
                  status = c(1, 1, 0, 1, 0, 1, 1, 0))
 
fit <- coxph(Surv(start, stop, status) ~ strata(sex) + cluster(id), data = df) 
basehaz(fit)
#------
  hazard time strata
1    0.5    1      0
2    1.0    2      0
3    1.5    5      0
4    2.0   10      0
5    2.0   19      0
6    1.0    7      1
7    1.0   18      1

Contrast that result with the results from a "null" model.

fit <- coxph(Surv(start, stop, status) ~ 1, data = df)
basehaz(fit)
#------------
     hazard time
1 0.3333333    1
2 0.6666667    2
3 1.0000000    5
4 1.3333333    7
5 1.6666667   10
6 1.6666667   18
7 1.6666667   19

The results are consistent with the usual explanation that stratified model deliver a weighted average of the results in two separated strata.