I'm a bit lost on whether there's a simpler way to use predict
to get out predictions from more complicated regression frameworks.
Take, for example, the following:
NN<-1e4
data<-data.table(trt=sample(paste("Treatment",1:3),NN,T),
qtl=sample(paste0("Q",1:2),NN,T),
grp=sample(4,NN,T),
cat=sample(paste("Category",LETTERS[1:3]),NN,T),
val=rnorm(NN,10)^2)
data[,out:=140+5*(trt=="Treatment 2")+3*(trt=="Treatment 3")+
8*(qtl=="Q2")-4*(trt=="Treatment 2"&qtl=="Q2")+
7*(trt=="Treatment 3"&qtl=="Q2")-4*(grp==2)+
6*(grp==3)-10*(grp==4)-6*(cat=="Category B")+
2*(cat=="Category C")-1.8*val+rnorm(NN,10)>0]
llog<-glm(out~trt*qtl+as.factor(grp)+cat+val,data=data,family=binomial(link="logit"))
Now, I want to get out of this predictions for the probability that out
=1 by trt
, by qtl
, with all other predictors held at sample averages.
Let's say I want this, for this case, in a 3 x 2 matrix (or table, etc.) with rows corresponding to trt
, columns to qtl
This is complicated in particular by the presence of factor variables--"held at sample averages" means we need to plug in the percentage of observations in each group, and I'm not sure how to do so in a clean way.
The really long way to do this, of course, is the following:
1) set up a vector of sample averages for the "other predictors":
oth.avg<-c(1,unlist(data[,lapply(list(grp==2,grp==3,grp==4,cat=="Category B",
cat=="Category C",val),mean)]))
2) multiply with the corresponding coefficients
x.beta.oth<-sum(llog$coefficients[c(1,5:10)]*oth.avg)
3) set up "matrices" for trt
, qtl
and trt
xqtl
terms:
(I say "matrices" because they're conceptually drawn from
underlying matrices, but it's more concise to specify them
in one dimension)
main.coef<-llog$coefficients[c(2:4,11:12)]
trt.mat<-rep(c(0,main.coef[1:2]),2)
qtl.mat<-rep(c(0,main.coef[3]),each=3)
tq.mat<-c(rep(0,3),rbind(rep(0,1),matrix(main.coef[4:5],ncol=1)))
(it's overkill to specify them like this in this small example, but the parsimony starts to show itself when the table is, say, 4x4.
4) add everything to get the predicted latent index
lat.pred<-trt.mat+qtl.mat+tq.mat+x.beta.oth
5) finally, transform these into predicted probabilities via the logistic formula: p/(1+p)
pred.prob<-matrix(exp(lat.pred)/(1+exp(lat.pred)),ncol=2,
dimnames=list(paste("Treatment",1:3),c("Q1","Q2")))
Q1 Q2
Treatment 1 3.515452e-28 6.000161e-22
Treatment 2 1.989195e-24 1.519577e-27
Treatment 3 3.380796e-26 1.633097e-20
Am I missing something? I don't see what I could feed into predict
to get this output...