Correct specification of an upper level predictor variable within a mixed effects (hierarchical) model using glmer() in R

70 views Asked by At

I am trying to run a mixed effects model or hierarchical linear regression using the glmer() package in R.

The dataset is a pooled cross-section of individual survey responses geographically nested within regions collected across 4 points in time. I am interested in introducing and exploring the significance of an upper level predictor (i.e. a macro-economic variable with unique values for each region) on an individual level outcome (a binary variable) alongside various other lower level predictors (individual characteristics).

I joined the regional data to the individual survey responses by region and year. As a result, any individual in "region 1" in "year 1" has an identical value for the upper level predictor.

I've been confused so far by the outcome of the mixed-effects model. Plotting the data would suggest a positive significant fixed effect for this regional-level-predictor but the result from the mixed model is instead weak and negative.

I ran a model with glmer() with the upper-level predictor included as a fixed effect alongside various categorical individual-level predictors. I also posited random effects for region and year. It is my understanding that by modelling random regional effects, I can include an upper-level (regional) predictor and thereby seperate out the effect of the (regional) upper-level predictor from unobserved regional heterogeneity. I included the survey weights in the model.

model <- glmer(binary-y ~ region.level.predictor +
                   [various categorical individual level predictors]
                   (1|year) +
                   (1|region),
                 data = pooled_survey, 
                 family = binomial(), weights = pwt,
                 nAGQ = 0)
#result
Random effects:
 Groups Name        Variance Std.Dev.
 region (Intercept) 0.12283  0.3505  
 year   (Intercept) 0.01306  0.1143  
Number of obs: 335319, groups:  region, 10; year, 4

Fixed effects:                                                                                                                   Estimate Std. Error  z value Pr(>|z|)    
(Intercept)
-2.2172042  0.1247093  -17.779  < 2e-16 ***

upper.level.predictor
-0.0507120  0.0006980  -72.650  < 2e-16 ***

individual.predictor1.level1
2.0791319  0.0008582 2422.778  < 2e-16 ***
individual.predictor1.level2
-1.3189206  0.0018920 -697.091  < 2e-16 ***

individual.predictor2.level1                                                                   
0.0187660  0.0011768   15.946  < 2e-16 ***
individual.predictor2.level2
0.0544561  0.0012773   42.635  < 2e-16 ***
individual.predictor2.level3
-0.4276550  0.0014786 -289.226  < 2e-16 ***

The result shows a significant but weak and negative effect for the (regional) upper-level-predictor. But plotting the dependent variable by the predictor at the regional level would suggest a positive relationship. Converting the estimate to an Odds Ratio gives a value close to 1 (0.95) which suggests to me I might doing something wrong in the way I'm defining the model. (Mean and minimum centering the regional predictor produces almost identical results).

The significance, direction and values for the individual level-predictors are as expected.

The regional random effects modelled account for some of the variance and seem "justifiable" and theoretically plausible. Random effects for year were included to account for temporal autocorrelation within survey years but the large sample size means these effects are close zero.

My question is:

  1. is this the correct way to include a regional-level predictor in a two level model applied to a pooled cross section of survey responses or am I overlooking something?

My understanding is I should only be modelling random slopes for the predictor alongside random effects (region.level.predictor||region) in the case I expect there to be an interaction effect between the unobserved regional heterogeneity and the predictor and - whilst there may be some theoretical case for this - I really expect the relationship to be somewhat universal.

  1. Given that there appears to be low temporal autocorrelation in the pooled (or "repeated") cross-section is there another way I can include the time element of this type of data in a mixed model?

I hope this is a well defined question. I have used the terminology from economic geography to describe the data and can provide alternative definitions if anything is unclear. Thanks in advance for any advice in making sense of this confusing result!

0

There are 0 answers