Survival analysis to get DROP OUT rates

336 views Asked by At

Ciao, I have data on student drop-outs and I am aiming to conduct a survival analysis I believe to examine or predict the probability of drop out at a given grade. The challenge however is I want to group grades together so for example (7,8) (9,10) (11,12)

Here is my replicating example. This is the data I have now:

data <- data.frame(STUDENT=c(1,1,1,1,2,2,2,2,3,3,3,3),
                  GRADE=c(9,10,11,12,7,8,9,10,9,10,11,12),
                  DROPOUT=c(0,0,0,0,0,0,1,1,0,0,0,1))

I made the data tall so for example STUDENT=1 never dropped out and STUDENT=2 dropped out in the 9th grade and STUDENT=3 dropped out in the 12th grade.

Now here is my basic survival analytic approach

attach(data)
survivalmodel <- Surv(time=GRADE,event=DROPOUT)

Do I need time2 = ? Could you say how important it is to have this and how it is possibly measured? I am self-taught and still reading.

So my question is how do I get drop out probabilities for GRADE bands (7,8) (9,10) (11,12) so to ultimately have a probability of student drop out in GRADES 7 and 8 separate for GRADES 9 and 10 separate for GRADES 11 and 12.

1

There are 1 answers

0
IRTFM On

time (what you were calling time1) should be the first observed grade attended. (I'm assuming that for any given school there would be new students transferring in.) time2 should be either the grade at which a dropout occurs or 12. Event should be as you have it, except you should not have duplicates. Line 8 should be deleted. You should construct a new dataframe that has 4 columns and three rows (one for each student.)

sdat <- read.table(text="STUDENT start GRADE DROPOUT
1 9 12 0
2 7 9 1
3 9 12 1", header=TRUE)
sdat
# NEVER use attach, but especially never with survival pkg functions

coxph( Surv(time=start, time2=GRADE, event=DROPOUT)~. , data=sdat[-1])
Call:  coxph(formula = Surv(time = start, time2 = GRADE, event = DROPOUT) ~ 
    ., data = sdat[-1])

Null model
  log likelihood= -0.6931472 
  n= 3