Stuck with package example code in R - simulating data to fit a model

521 views Asked by At

I am trying to understand the function indeptCoxph in the spBayesSurv package. This function fits a Bayesian proportional hazards model. I am getting a little stuck with understanding parts of the R code as well as the Cox model theory.

I am working on the authors' example (below). They have first simulated survival time data and I am having trouble following their code for doing this. It seems to me that first they are simulating survival times from an exponential distribution with CDF F(t) = 1- exp(-lambda*t) except that their value for lambda is exp(sum(xi * betaT)) rather than just a constant. In order to simulate data, the parameter betaT is given a fixed constant value which is its true value and xi is the predictor data.

Question 1-Is this definition/form of lambda due to the Cox Hazard model ? In this example, are the authors making special assumptions about the survival distribution?

Question 2- I am stuck with understanding the following key piece of code which generates the survival time data(of course it relies on earlier code, given at the end):

## Generate survival times t

u = pnorm(z);
t = rep(0, ntot);
for (i in 1:ntot){
t[i] = Finv(u[i], x[i]);
}
tTrue = t; #plot(x,t);

The function Finv(u,xi) gets the value of survival time t that satisfies F(t) = u, where I think xi is the predictor variable. I don't really understand why u has to come from the normal CDF. They have generated "z" as a single draw from a multivariate normal distribution (with 3 components ), and u is the vector of Normal CDF values u = pnorm(z). Again, not sure why "u" has to be generated this way - would be really helpful if the relationship between u,z,t and lambda could be clarified. The covariance matrix for "z" also is generated by the author from two row vectors s1, and s2 in the code - but its confusing what the role of s1,s2 would be if I were just fitting a model with survival time data "t" and predictor variable "x".

Authors' code:

###############################################################
# A simulated data: Cox PH
###############################################################

rm(list=ls())
library(survival)
library(spBayesSurv)
library(coda)
library(MASS)
## True parameters
betaT = c(-1);
theta1 = 0.98; theta2 = 100000;
## generate coordinates:
## npred is the # of locations for prediction
n = 100; npred = 30; ntot = n + npred;
ldist = 100; wdist = 40;
s1 = runif(ntot, 0, wdist); s2 = runif(ntot, 0, ldist);
s = rbind(s1,s2); #plot(s[1,], s[2,]);
## Covariance matrix
corT = matrix(1, ntot, ntot);
for (i in 1:(ntot-1)){
for (j in (i+1):ntot){
dij = sqrt(sum( (s[,i]-s[,j])^2 ));
corT[i,j] = theta1*exp(-theta2*dij);
corT[j,i] = theta1*exp(-theta2*dij);
}
}
## Generate x
x = runif(ntot,-1.5,1.5);
## Generate transformed log of survival times
z = mvrnorm(1, rep(0, ntot), corT);
## The CDF of Ti: Lambda(t) = t;
Fi = function(t, xi){
res = 1-exp(-t*exp(sum(xi*betaT)));
res[which(t<0)] = 0;
res
}
## The pdf of Ti:
fi = function(t, xi){
res=(1-Fi(t,xi))*exp(sum(xi*betaT));
res[which(t<0)] = 0;
res
}
#integrate(function(x) fi(x, 0), -Inf, Inf)
## true plot
xx = seq(0, 10, 0.1)
#plot(xx, fi(xx, -1), "l", lwd=2, col=2)
#lines(xx, fi(xx, 1), "l", lwd=2, col=3)

## The inverse for CDF of Ti
Finvsingle = function(u, xi) {
res = uniroot(function (x) Fi(x, xi)-u, lower=0, upper=5000);
res$root
}
Finv = function(u, xi) {sapply(u, Finvsingle, xi)};

## Generate survival times t
u = pnorm(z);
t = rep(0, ntot);
for (i in 1:ntot){
t[i] = Finv(u[i], x[i]);
}
tTrue = t; #plot(x,t);
1

There are 1 answers

0
Haiming Zhou On BEST ANSWER

Actually, the data are generated in the framework of spatial copula Cox PH model. It is helpful to read Section 4.1 of the supplemental material of Zhou et al. (2015). As you are fitting non-spatial PH model, the data generating procedure can be sampled without the use of s1 and s2; see the new example at https://stats.stackexchange.com/questions/253368/bayesian-survival-analysis.

In this new example, f0oft(t) and S0oft(t) are baseline survival functions, respectively. Given the subject with covariates x, Sioft(t,x) and fioft(t,x) are the survival and density for that subject. Finv(u,x) is the inverse function for Fioft(t,x)=1-Sioft(t,x), that is, Finv(u,x) is the solution to Fioft(t,x)=u w.r.t t.

To generate the survival data, we can first generate the covariates:

    x1 = rbinom(ntot, 1, 0.5); x2 = rnorm(ntot, 0, 1); X = cbind(x1, x2);

Given each covariate vector X, true survival time tT can be generated as

    u = runif(ntot);
    tT = rep(0, ntot);
    for (i in 1:ntot){
      tT[i] = Finv(u[i], X[i,]);
    }

Here the rationale behind is that if T|x ~ F(t,x), then F(T,x) ~ Uniform(0,1).