Using predict for more complicated predictions

108 views Asked by At

I'm a bit lost on whether there's a simpler way to use predict to get out predictions from more complicated regression frameworks.

Take, for example, the following:

NN<-1e4
data<-data.table(trt=sample(paste("Treatment",1:3),NN,T),
                 qtl=sample(paste0("Q",1:2),NN,T),
                 grp=sample(4,NN,T),
                 cat=sample(paste("Category",LETTERS[1:3]),NN,T),
                 val=rnorm(NN,10)^2)

data[,out:=140+5*(trt=="Treatment 2")+3*(trt=="Treatment 3")+
       8*(qtl=="Q2")-4*(trt=="Treatment 2"&qtl=="Q2")+
       7*(trt=="Treatment 3"&qtl=="Q2")-4*(grp==2)+
       6*(grp==3)-10*(grp==4)-6*(cat=="Category B")+
       2*(cat=="Category C")-1.8*val+rnorm(NN,10)>0]

llog<-glm(out~trt*qtl+as.factor(grp)+cat+val,data=data,family=binomial(link="logit"))

Now, I want to get out of this predictions for the probability that out=1 by trt, by qtl, with all other predictors held at sample averages.

Let's say I want this, for this case, in a 3 x 2 matrix (or table, etc.) with rows corresponding to trt, columns to qtl

This is complicated in particular by the presence of factor variables--"held at sample averages" means we need to plug in the percentage of observations in each group, and I'm not sure how to do so in a clean way.

The really long way to do this, of course, is the following:

1) set up a vector of sample averages for the "other predictors":

oth.avg<-c(1,unlist(data[,lapply(list(grp==2,grp==3,grp==4,cat=="Category B",
                                  cat=="Category C",val),mean)]))

2) multiply with the corresponding coefficients

x.beta.oth<-sum(llog$coefficients[c(1,5:10)]*oth.avg)

3) set up "matrices" for trt, qtl and trtxqtl terms: (I say "matrices" because they're conceptually drawn from underlying matrices, but it's more concise to specify them in one dimension)

main.coef<-llog$coefficients[c(2:4,11:12)]
trt.mat<-rep(c(0,main.coef[1:2]),2)
qtl.mat<-rep(c(0,main.coef[3]),each=3)
tq.mat<-c(rep(0,3),rbind(rep(0,1),matrix(main.coef[4:5],ncol=1)))

(it's overkill to specify them like this in this small example, but the parsimony starts to show itself when the table is, say, 4x4.

4) add everything to get the predicted latent index

lat.pred<-trt.mat+qtl.mat+tq.mat+x.beta.oth

5) finally, transform these into predicted probabilities via the logistic formula: p/(1+p)

pred.prob<-matrix(exp(lat.pred)/(1+exp(lat.pred)),ncol=2,
              dimnames=list(paste("Treatment",1:3),c("Q1","Q2")))

                      Q1           Q2
Treatment 1 3.515452e-28 6.000161e-22
Treatment 2 1.989195e-24 1.519577e-27
Treatment 3 3.380796e-26 1.633097e-20

Am I missing something? I don't see what I could feed into predict to get this output...

0

There are 0 answers