fitted values from speedglm() look very different from fitted values with glm()

267 views Asked by At

The fitted values returned from speedglm() look really different from those returned from glm() and i don't know why. For example, if I run this:

data("lalonde")
glm <- glm(married ~ treat + age + educ + black + hisp + nodegr, data = lalonde, family = "binomial")
fitted_vals <- glm$fitted.values

I get broadly what i'd expect, which is a fitted value per observation between 0 and 1 (the two possible values of married). E.g.

skimr::skim(fitted_vals)

── Data Summary ────────────────────────
                           Values     
Name                       fitted_vals
Number of rows             445        
Number of columns          1          
_______________________               
Column type frequency:                
  numeric                  1          
________________________              
Group variables            None       

── Variable type: numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  skim_variable n_missing complete_rate  mean     sd     p0   p25   p50   p75  p100 hist 
1 data                  0             1 0.169 0.0913 0.0378 0.105 0.147 0.205 0.627 ▇▅▁▁▁

However, if I run the same model using speedglm() i get pretty different results:

speedglm <- speedglm(married ~ treat + age + educ + black + hisp + nodegr, data = lalonde, family = binomial(), fitted = TRUE)
fitted_vals <- speedglm$linear.predictors

skimr::skim(fitted_vals)

── Data Summary ────────────────────────
                           Values     
Name                       fitted_vals
Number of rows             445        
Number of columns          1          
_______________________               
Column type frequency:                
  numeric                  1          
________________________              
Group variables            None       

── Variable type: numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  skim_variable n_missing complete_rate  mean    sd    p0   p25   p50   p75  p100 hist 
1 data                  0             1 -1.71 0.606 -3.24 -2.14 -1.76 -1.35 0.521 ▂▇▇▂▁

Does anyone know what's going on here? linear.predictors seems to be the analogous value to glm's fitted.values according to the documentation. It shouldn't, as far as I understand, be possible to get fitted values outside of the range of the dependent variable, but clearly that's what's happening

1

There are 1 answers

0
Ben Bolker On BEST ANSWER

"Linear predictors" are not the same as "fitted values", unless a GLM is fitted with an identity link. In general the linear predictor is eta = b0 + b1*x1 + b2*x2 + ..., while the fitted value is mu = linkinv(eta), where linkinv is the inverse link function (e.g. logistic or inverse-logit in this case).

In general it's always safer to use accessor methods: that way you don't have to worry about internal definitions

## fitted values (data scale)
all.equal(fitted(glm), fitted(speedglm))  ## TRUE
## predicted values (linear-predictor scale)
all.equal(predict(glm), predict(speedglm))  ## TRUE
## predict(., type = "response") == fitted(.)
all.equal(predict(glm, type = "response"), fitted(speedglm)) ## TRUE