Calculate survival p values for multiple variables

539 views Asked by At

I have a long list of variables and I would like to calculate differences in survival (p values) for each one of those variables. I use the survfit() and surv_pvalue() functions to get the result, but I'm facing some issues on looping over variables

library(survminer)

set.seed(2020)
data <- data.frame(Months = 10 + rnorm(1:10),
                   Status = c(rep((0),5),rep((1),5)),
                   clin = rep("bla bla", 10),
                   Var1 = sample(0:1, 10, replace=T,prob=c(0.5,0.5)),
                   Var2 = sample(0:1, 10, replace=T,prob=c(0.5,0.5)),
                   Var3 = sample(0:1, 10, replace=T,prob=c(0.5,0.5)))

fit.list <- list()

for (i in (4:ncol(data))){
  fit <- survfit(Surv(Months, Status) ~ colnames(data)[3+i], data = data)
  fit2 <- surv_pvalue(fit)
  fit.list[[i]] <- fit2
}

results in:

Error in model.frame.default(formula = Surv(Months, Status) ~ colnames(data)[3 +  : 
  variable lengths differ (found for 'colnames(data)[3 + i]')

likely meaning that there is a discordance between the lengths of 4:ncol(data) and colnames(data)[3 + i], but how exactly do I have to specify them? Thank you in advance for the solutions!

2

There are 2 answers

1
Allan Cameron On BEST ANSWER

You could use lapply instead of iterating and appending to a list:

lapply(data[4:6], function(i) {
  surv_pvalue(eval(call("survfit", 
                        formula = reformulate(names(data)[i], 
                                              "Surv(Months, Status)"), 
                        data = data)))
})
#> $Var1
#>   variable      pval   method pval.txt
#> 1   Months 0.3371479 Log-rank p = 0.34
#> 
#> $Var2
#>   variable      pval   method pval.txt
#> 1   Months 0.3371479 Log-rank p = 0.34
#> 
#> $Var3
#>   variable      pval   method pval.txt
#> 1   Months 0.3371479 Log-rank p = 0.34

1
Nimzo On

Allan Cameron's answer is straight forward and perfectly working. I got it almost perfectly working with:

library(survminer)

set.seed(2020)
dat <- data.frame(Months = 10 + rnorm(1:10),
                   Status = c(rep((0),5),rep((1),5)),
                   clin = rep("bla bla", 10),
                   Var1 = sample(0:1, 10, replace=T,prob=c(0.5,0.5)),
                   Var2 = sample(0:1, 10, replace=T,prob=c(0.5,0.5)),
                   Var3 = sample(0:1, 10, replace=T,prob=c(0.5,0.5)))

fit.list <- list()

for (i in (4:length(dat))){
  fit <- survfit(Surv(Months, Status) ~ dat[,i], data = dat)
  fit2 <- surv_pvalue(fit)
  fit.list[[i]] <- fit2
}