I am trying to use speedglm
to achieve a faster GLM estimation than glm
, but why it is even slower?
set.seed(0)
n=1e3
p=1e3
x=matrix(runif(n*p),nrow=n)
y=sample(0:1,n,replace = T)
ptm <- proc.time()
fit=glm(y~x,family=binomial())
print(proc.time() - ptm)
# user system elapsed
# 10.71 0.07 10.78
library(speedglm)
ptm <- proc.time()
fit=speedglm(y~x,family=binomial())
print(proc.time() - ptm)
# user system elapsed
# 15.11 0.12 15.25
The efficiency of
speedglm
overglm
, is the way it reduces then * p
model matrix to ap * p
matrix. However, if you haven = p
, there is no effective reduction. What you really want to check, is then >> p
case.More insight form computational complexity, in each iteration of Fisher scoring.
glm
using QR factorization for an * p
matrix takes2np^2 - (2/3)p^3
FLOP, whilespeedglm
forming matrix cross product of an * p
matrix, followed by a QR factorization of ap * p
matrix, involvesnp^2 + (4/3)p^3
FLOP. So asn >> p
,speedglm
only has half the computation amount ofglm
. Furthermore, the blocking, caching strategy used byspeedglm
gives better use of computer hardware, giving high performance.If you have
n = p
, you immediately see thatglm
takes(4/3)p^3
FLOP, butspeedglm
takesp^3 + (4/3)p^3
FLOP, more expensive! In fact, the matrix cross product becomes a shear overhead in this case!