Auto-Listing comparisons in ggbetweenstats over dates (or datetimes)

40 views Asked by At

i'm measuring some physical measures over time then i would like to plot using violinplots in ggbetweenstats. I want to compare only first group with all followings, without writing, for each graph and for each day, the comparisons list and updating it every day. My idea was to select all unique dates, create a database with two colomns where in first colomn was earlier date then in the second colomn all the other dates. Then coerce as a list...but it didn't work as i was hoping.

Here my reproducible example using a flight database.

library(nycflights13)

CarrierList<-unique(flights$carrier)
i=12
a<-flights %>% mutate(departureDay = lubridate::make_date(year, month, day)) %>% dplyr::filter(origin =="JFK" & carrier==CarrierList[i]& departureDay<="2013-01-10"& departureDay>="2013-01-02") %>% select(departureDay) %>% unique() %>% arrange(departureDay) %>% slice(1) 
aa<-flights %>% mutate(departureDay = lubridate::make_date(year, month, day)) %>% dplyr::filter(origin =="JFK" & carrier==CarrierList[i]& departureDay<="2013-01-10"& departureDay>="2013-01-02") %>% select(departureDay) %>% unique() %>% arrange(departureDay)%>% slice(2:n())

ggbetweenstats(
    data = flights %>% mutate(departureDay = lubridate::make_date(year, month, day)) %>% dplyr::filter(origin =="JFK" & carrier==CarrierList[i] & departureDay<="2013-01-10" & departureDay>="2013-01-02"),
    x=departureDay,
    y = arr_delay,
    pairwise.display = "none",
    p.adjust.method = "holm", 
    type = "nonparametric",   
    ggtheme  = jtools::theme_apa()) +
    ggsignif::geom_signif(map_signif_level = c("***"=0.001, "**"=0.01, "*"=0.05),
                          comparisons = list(data.frame(a$departureDay,aa$departureDay)
                          ))

this will produce following error output

    `Computation failed in `stat_signif()`
Caused by error in `mapped_discrete()`:
! Can't convert `x` <data.frame> to <double>.`

What i want to produce is something like the output of the following code:


ggbetweenstats(
    data = flights %>% mutate(departureDay = lubridate::make_date(year, month, day)) %>% dplyr::filter(origin =="JFK" & carrier==CarrierList[i] & departureDay<="2013-01-10" & departureDay>="2013-01-02"),
    x=departureDay,
    y = arr_delay,
    pairwise.display = "none",
    p.adjust.method = "holm", 
    type = "nonparametric",   
    ggtheme  = jtools::theme_apa()) +
    ggsignif::geom_signif(y_position = c(300, 310, 320, 330, 340, 350, 360, 370),
                            map_signif_level = c("***"=0.001, "**"=0.01, "*"=0.05),
                          comparisons = list(c("2013-01-02", "2013-01-03"),
                                             c("2013-01-02", "2013-01-04"),
                                             c("2013-01-02", "2013-01-05"),
                                             c("2013-01-02", "2013-01-06"),
                                             c("2013-01-02", "2013-01-07"),
                                             c("2013-01-02", "2013-01-08"),
                                             c("2013-01-02", "2013-01-09"),
                                             c("2013-01-02", "2013-01-10"))
        )

Desired output

I've tried also another way as follow, but it didn't mantain date format

b<-lapply(
    1:7,
    function(i) c(
        combn(a$departureDay, 2)[2, i],
        combn(aa$departureDay, 2)[1, i]
    )
)

Any hints or suggestions??

1

There are 1 answers

0
Allan Cameron On BEST ANSWER

I'm a little confused at how you are trying to create your comparison list. I think I would create a filtered data frame of everything you want to plot, and convert the dates to factors:

library(tidyverse)
library(ggstatsplot)
library(ggsignif)

df <- nycflights13::flights %>% 
  mutate(departureDay = lubridate::make_date(year, month, day)) %>% 
  filter(origin =="JFK" & carrier == "9E" & 
         departureDay <= "2013-01-10" & departureDay >= "2013-01-02") %>% 
  select(departureDay, arr_delay) %>% 
  arrange(departureDay)%>% 
  mutate(departureDay = factor(departureDay)) 

Now you can create a list of all the comparisons you want using lapply with a single one-liner rather than repeating all the dplyr code:

cmp <- lapply(levels(df$departureDay)[-1], \(x) c(levels(df$departureDay)[1],x))

Your plotting code is then much simpler too:

ggbetweenstats(data = df, x = departureDay, y = arr_delay,
               pairwise.comparisons = FALSE, ggtheme = jtools::theme_apa()) +
  geom_signif(map_signif_level = c("***" = 0.001, "**" = 0.01, "*" = 0.05),
              comparisons = cmp, step_increase = 0.05)

enter image description here