How to plot raw data (geom_point()) on an axis with a distribution (stat_halfeye) in R

75 views Asked by At

In this fictitious example, I have predicted distributions for grasshoppers by age in months, but want to add raw data. How does one do this when working with rvar data?

I tried to convert my numerical data to rvar with an sd = 0, but failed.

Model:

Grasshopper_model <- brm(kJs ~ Hopped + Age + Hopped*Age + (1|ID)
    family = gaussian(),
    data = df)

num_ids <- 10

df <- expand.grid(
    "Age" = c(3, 4, 5, 6, 7, 8, 9, 10, 11),
    "Year" = 2011,
    "ID" = 1:num_ids,  
    "Hopped" = c(0, 1)
)
condition <-expand.grid("Age" = (c(3, 4, 5, 6, 7,8, 9, 10, 11)),
                        Year = 2011,
                        ID = 1,
                        Hopped = c(0, 1)) 
df <- df %>%
    group_by(ID) %>% 
    mutate(kJs = ifelse(Hopped == 0, rnorm(1, mean = 3, sd = 2), rnorm(1, mean = 15, sd = 2)))

Filtered_data <- df%>% 
    filter(!is.na(kJs),
           !is.na(Age),
           !is.na(Hopped),
           Age>2,
           Age < 12) %>% 
    mutate(Hopped= ifelse(Hopped== 1, "Yes", "No"),
           .prediction = kJs)

Grasshopper_model <- brm(kJs ~ Hopped + Age + Hopped*Age + (1|ID),
                         family = gaussian(),
                         data = df)

condition %>% 
    add_predicted_rvars(Grasshopper_model, allow_new_levels = TRUE, re_formula = NA, newdata = .) %>% 
    mutate(Hopped = factor(ifelse(Hopped == 1, "Yes", "No"), levels = c("No", "Yes"))) %>% 
    ggplot(aes(ydist = .prediction[,"kJs"], x = Hopped)) +
    stat_halfeye(aes(fill = factor(Age))) +
    # geom_point(data = Filtered_data , 
    #          aes(x =  as.factor(Hopped), y = .prediction), alpha = 0.1) +
    facet_wrap2(~ Age, strip = strip_colours, scale = "free_x") +
    labs(color = "Age", fill = "Age") +
    ylab("Predicted energy expenditure (kJ)") +
    xlab("Hopped") +
    theme(strip.background=element_blank(),
          panel.background = element_rect(fill = "white"),  
          axis.line = element_line(color = "black"),
          axis.text = element_text(face = "bold", size = 11),
          legend.text = element_text(size = 14), 
          legend.title = element_text(size = 14),
          strip.text = element_text(size = 14),
          axis.title = element_text(face = "bold", size = 15),
          legend.position = "none") +
    scale_color_manual(values = custom_palette) +
    scale_fill_manual(values = custom_palette) +
    geom_hline(yintercept = 0, linetype = "dashed", size = .25)

On a similar note, I cannot plot any logistic regression predictions due to an error message:

Warning message: Computation failed in stat_slabinterval() Caused by error in bw.SJ(): ! sample is too sparse to find TD

1

There are 1 answers

0
KellyForrester On

I should not have added all the additional information to the ggplot as it is unnecessary, but I did find a solution!

By simply defining the aesthetics within each geom function, the plot works fine:

condition %>% 
    add_predicted_rvars(Grasshopper_model, allow_new_levels = TRUE, re_formula = NA, newdata = .) %>% 
    mutate(Hopped = factor(ifelse(Hopped == 1, "Yes", "No"), levels = c("No", "Yes"))) %>% 
    ggplot() + #leave the ggplot() function empty
    stat_halfeye(aes(ydist = .prediction[,"kJs"], x = Hopped, fill = factor(Age))) +
    geom_point(data = Filtered_data , 
             aes(x =  as.factor(Hopped), y = .prediction), alpha = 0.1) +
    facet_wrap2(~ Age, strip = strip_colours, scale = "free_x") +
    labs(color = "Age", fill = "Age") +
    ylab("Predicted energy expenditure (kJ)") +
    xlab("Hopped")