Plotting results with missing categories in interaction with emmeans

515 views Asked by At

I have a quite "messy data". I have a model with a interaction between two factors. And I want to plot it. So:

f1 <- structure(list(tipo = c("digitables", "digitables", "digitables", 
"digitables", "digitables", "digitables", "digitables", "digitables", 
"payments", "payments", "payments", "payments", "payments", "payments", 
"payments", "payments", "traditionals", "traditionals", "traditionals", 
"traditionals", "traditionals", "traditionals", "traditionals", 
"traditionals"), categoria = c("Advice", "Digital banks", "Exchange", 
"FinTech", "Insurance", "Investments", "Lending", "Payments and transfers", 
"Advice", "Digital banks", "Exchange", "FinTech", "Insurance", 
"Investments", "Lending", "Payments and transfers", "Advice", 
"Digital banks", "Exchange", "FinTech", "Insurance", "Investments", 
"Lending", "Payments and transfers"), Total = c(63L, 450L, 279L, 
63L, 36L, 108L, 567L, 549L, 63L, 450L, 279L, 63L, 36L, 108L, 
567L, 549L, 35L, 250L, 155L, 35L, 20L, 60L, 315L, 305L), Frequencia = c(44L, 
266L, 118L, 9L, 14L, 45L, 134L, 242L, 33L, 68L, 2L, 10L, 3L, 
8L, 11L, 78L, 27L, 226L, 142L, 10L, 20L, 45L, 300L, 245L), Perc = c(69.84, 
59.11, 42.29, 14.29, 38.89, 41.67, 23.63, 44.08, 52.38, 15.11, 
0.72, 15.87, 8.33, 7.41, 1.94, 14.21, 77.14, 90.4, 91.61, 28.57, 
100, 75, 95.24, 80.33), Failure = c(19L, 184L, 161L, 54L, 22L, 
63L, 433L, 307L, 30L, 382L, 277L, 53L, 33L, 100L, 556L, 471L, 
8L, 24L, 13L, 25L, 0L, 15L, 15L, 60L)), row.names = c(NA, -24L
), class = "data.frame")
# Packages
library(dplyr)
library(ggplot2)
library(emmeans) #version 1.4.8. or 1.5.1
# Works as expected
m1 <- glm(cbind(Frequencia, Failure) ~ tipo*categoria,
          data = f1, family = binomial(link = "logit"))
l1 <- emmeans(m1, ~categoria|tipo)
plot(l1, type = "response",
        comparison = T,
     by = "categoria")

enter image description here

Using by="tipo" results:

# Doesn't work:
plot(l1, type = "response",
        comparison = T,
     by = "tipo")
Error: Aborted -- Some comparison arrows have negative length!
In addition: Warning message:
Comparison discrepancy in group digitables, Advice - Insurance:
    Target overlap = -0.0241, overlap on graph = 0.0073 

If I use comparison = F as suggested by explanation supplement vignette, it works. However, it does not show me the arrows, which are very important.

Q1 - Is there a work around for it? (Or is it impossible due to my data?)

As we can see from the last plot, there is a category with probability = 1 (categoria=Insurance and tipo=traditionals). So, I delete only this row of my data frame, and I try to redo the plotting, and results to me:

f1 <- f1 %>% 
  filter(!Perc ==100)
m1 <- glm(cbind(Frequencia, Failure) ~ tipo*categoria,
          data = f1, family = binomial(link = "logit"))
l1 <- emmeans(m1, ~categoria|tipo)
plot(l1, type = "response",
        comparison = T,
     by = "categoria")
Error in if (dif[i] > 0) lmat[i, id1[i]] = rmat[i, id2[i]] = wgt * v1[i] else rmat[i,  : 
  missing value where TRUE/FALSE needed

Q2 - How to plot my results even when I have a missing level of one variable (with respect to another variable?). I would expect that the Insurance facet would have only have the payments and digitables levels (while the others remain the same).

1

There are 1 answers

2
Russ Lenth On BEST ANSWER

First, please don't ever re-use the same variable names for more than one thing; that makes things not reproducible. If you modify a dataset, or a model, or whatever, give it a new name so it can be distinguished.

Q1

As documented, comparison arrows cannot always be computed. This is such an example. I suggest displaying the results some other way, e.g. using pwpp() or pwpm()

Q2

There was a bug in handling missing cases. This has been fixed in the GitHub version:

f2 <- f1 %>% 
    filter(!Perc ==100)
m2 <- glm(cbind(Frequencia, Failure) ~ tipo*categoria,
          data = f2, family = binomial(link = "logit"))
l2 <- emmeans(m2, ~categoria|tipo)

plot(l2, type = "response",
     comparison = TRUE,
     by = "categoria")

enter image description here

plot(l2, type = "response",
     comparison = TRUE,
     by = "tipo")

## Error: Aborted -- Some comparison arrows have negative length!
## (in group "payments")