Exposure variable in logistic regression

516 views Asked by At

I have a data frame which contains some characteristics from clients and contracts and 0s and 1s showing whether a fall happened the period between 2008 and 2017. I'm using a binomial model to regress probability of fall on the characteristics. I have 38000 differents contracts.

So I'm using an binomial model like this (R-code):

formule <- y ~ Niveau_gar_incapacite + Niv_indem_mens + Regrpt_franchise + Niveau_prime + Situation_familiale + Classe_age_chute + Grde_Region + Regrpt_strate + Taille_courtier + Commission + Retention + Anciennete + Regrpt_CSP + Regrpt_sinistres + Couplage

logit <- glm(Chute_commerciale~1, data=train, family=binomial(link="logit"))

selection_asc_AIC <- step(logit, direction="forward", trace=TRUE, k=2, scope=list(upper=formule))

After some tests to find multi-collinearity, I did eliminations of variables or groupings of terms. I have this result :

results from GLM

results from GLM

results from GLM 2

results from GLM 2

This results are not correct with null deviance and residual deviance.

I supposed my variable exposure that is the problem. In fact, I have contracts beginning and finishing at differents years. So my exposure can be 5.32 or 1.36 and I have truncation and censorship.

How can I treat this variable exposure in regression logistic binomial ? If I duplicate my row by the number of year of exposure, there is a problem of independance of observations.

0

There are 0 answers