NA coefficients produced by "clogit" function for interaction term

27 views Asked by Alessandro At 27 February 2024 at 09:31

I am an ornithologist currently conducting a study on habitat use and selection in three animal species in agricultural habitats. I have been using the survival package in R for my analyses, particularly the clogit function.

To provide more context, my dataset consists of over 2400 records, and each habitat use site is paired with a random control site. The paired observations (use site + a randomly control site) share the same code under the variable "group." For each habitat use site, I have collected information on three variables: the observed species (categorical factor with three levels, "species"), the vegetation height (continuous variable, "vegetation.height" in cm), and the type of vegetation structure (categorical factor with three levels, "vegetation.structure"). You can download it from: https://docs.google.com/spreadsheets/d/1B7xalNuI5IQLvEcAa-OaFESjwgJrP-uA/edit?usp=sharing&ouid=100938199232902549570&rtpof=true&sd=true

I am encountering a challenge with the clogit function when considering certain fixed terms in the model. When I construct the model with "vegetation.height + vegetation.height:species" as fixed terms, everything works smoothly. However, when I try a similar approach with "vegetation.structure + vegetation.structure:species," (both categorical factor with three levels) I encounter NA coefficients for the "type3" level of "vegetation.structure" in the interaction term.

library(readxl)
library(survival)
library(earth)

# import dataset
data <- read_excel("dataset.xlsx", sheet = "sheet1")

# run a clogit regression model to study influence of vegetation.height on use, by species
mod1 <- clogit(use ~ vegetation.height +
                     vegetation.height:species +
                     strata(group),
                     data=data)
summary(mod1)

# run a second clogit regression model
# study influence of vegetation.structure on use, by species

mod2 <- clogit(use ~ vegetation.structure +
                     vegetation.structure:species +
                     strata(group),
                     data=data)
summary(mod2)

# NA coefficients produced by clogit!

Here's the summary output of mod2:

> summary(mod2)
Call:
coxph(formula = Surv(rep(1, 2418L), use) ~ vegetation.structure + 
    vegetation.structure:species + strata(group), data = data, 
    method = "exact")

  n= 2418, number of events= 1209 

                                               coef exp(coef)  se(coef)      z Pr(>|z|)    
vegetation.structuretype2                 -2.400984  0.090629  0.507924 -4.727 2.28e-06 ***
vegetation.structuretype3                 -4.040649  0.017586  0.459174 -8.800  < 2e-16 ***
vegetation.structuretype1:speciesspecies2 -2.054953  0.128099  0.518891 -3.960 7.49e-05 ***
vegetation.structuretype2:speciesspecies2 -0.004336  0.995673  0.511864 -0.008  0.99324    
vegetation.structuretype3:speciesspecies2        NA        NA  0.000000     NA       NA    
vegetation.structuretype1:speciesspecies3 -1.322921  0.266356  0.605789 -2.184  0.02898 *  
vegetation.structuretype2:speciesspecies3 -1.015964  0.362053  0.378729 -2.683  0.00731 ** 
vegetation.structuretype3:speciesspecies3        NA        NA  0.000000     NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                                          exp(coef) exp(-coef) lower .95 upper .95
vegetation.structuretype2                   0.09063     11.034   0.03349   0.24525
vegetation.structuretype3                   0.01759     56.863   0.00715   0.04325
vegetation.structuretype1:speciesspecies2   0.12810      7.806   0.04633   0.35418
vegetation.structuretype2:speciesspecies2   0.99567      1.004   0.36510   2.71529
vegetation.structuretype3:speciesspecies2        NA         NA        NA        NA
vegetation.structuretype1:speciesspecies3   0.26636      3.754   0.08125   0.87320
vegetation.structuretype2:speciesspecies3   0.36205      2.762   0.17234   0.76058
vegetation.structuretype3:speciesspecies3        NA         NA        NA        NA

Concordance= 0.715  (se = 0.012 )
Likelihood ratio test= 531.8  on 6 df,   p=<2e-16
Wald test            = 221.7  on 6 df,   p=<2e-16
Score (logrank) test = 433.3  on 6 df,   p=<2e-16

I am puzzled by this issue and suspect that the regression might be considering the "type3" level of "vegetation.structure" as the reference, estimating coefficients as differences from this level. However, my goal would be to obtain estimates for all coefficients to understand the species comparisons.

I manipulated the dataset by introducing a significantly imbalanced number of observations for habitat use (1) in the "type3" level of "vegetation.structure". This manipulation did not resolve the issue, leading me to believe that the NA coefficients are not a result of an equal number of observations between habitat use and random conditions.

Moreover, during dataset manipulation, I observed instances where the model considers either "type1" or "type2" of "vegetation.structure" as the reference level, resulting in NA coefficients. This inconsistency adds to my confusion, as I am striving to obtain reliable estimates for all levels of "vegetation.structure" to facilitate species comparisons.

I greatly appreciate your expertise and was wondering if you could provide any insights or guidance on how to address this matter.

Original Q&A

TechQA.

NA coefficients produced by "clogit" function for interaction term

There are 0 answers

Related Questions in R

Related Questions in MODEL

Related Questions in NA

Related Questions in SURVIVAL

Popular Questions

Trending Questions