Ciao, I have data on student drop-outs and I am aiming to conduct a survival analysis I believe to examine or predict the probability of drop out at a given grade. The challenge however is I want to group grades together so for example (7,8) (9,10) (11,12)
Here is my replicating example. This is the data I have now:
data <- data.frame(STUDENT=c(1,1,1,1,2,2,2,2,3,3,3,3),
GRADE=c(9,10,11,12,7,8,9,10,9,10,11,12),
DROPOUT=c(0,0,0,0,0,0,1,1,0,0,0,1))
I made the data tall so for example STUDENT=1 never dropped out and STUDENT=2 dropped out in the 9th grade and STUDENT=3 dropped out in the 12th grade.
Now here is my basic survival analytic approach
attach(data)
survivalmodel <- Surv(time=GRADE,event=DROPOUT)
Do I need time2 = ? Could you say how important it is to have this and how it is possibly measured? I am self-taught and still reading.
So my question is how do I get drop out probabilities for GRADE bands (7,8) (9,10) (11,12) so to ultimately have a probability of student drop out in GRADES 7 and 8 separate for GRADES 9 and 10 separate for GRADES 11 and 12.
time
(what you were callingtime1
) should be the first observed grade attended. (I'm assuming that for any given school there would be new students transferring in.)time2
should be either the grade at which a dropout occurs or 12. Event should be as you have it, except you should not have duplicates. Line 8 should be deleted. You should construct a new dataframe that has 4 columns and three rows (one for each student.)