Issue converting a grouped_ts tsibble into a fable

59 views Asked by At

I am generally following Hyndman's 3rd edition complex seasonality example in ch 12.1. https://otexts.com/fpp3/complexseasonality.html

His example uses sub-daily data to forecast calls every 5 minutes. I am trying to forecast the number of "returns per hour", by department and location - D and M, using a harmonic regression model with the number of scheduled visits. The data has been previously filtered to only contain the hours between and including 8AM to 5PM, M-F. The issue I am having is converting the index from a proxy of grouped row numbers, repeated over D and M, to the appropriate times for interpretation and modeling, and filtering the data to remove the "test" returns while keeping the number of scheduled visits.

ns_data_tsbl <- ns_data %>%  
  mutate(Appt_Slot = as_datetime(appt_time)) %>% 
  select(Appt_Slot, D, M,  returns, visit_count) %>%
  as_tsibble(key=c(D, M),index=Appt_Slot)

D_M_nsA <- ns_data_tsbl |>
  select(Appt_Slot, D, M,  returns, visit_count) |>
  aggregate_key(D * M, returns = sum(returns), visit_count = sum(visit_count))
D_M_nsA
# A tsibble: 563,040 x 5 [1h] <UTC>
# Key:       D, M [72]
   Appt_Slot           D            M            returns     visit_count
   <dttm>              <chr*>       <chr*>           <int>       <int>
 1 2020-10-01 08:00:00 <aggregated> <aggregated>        24         244
 2 2020-10-01 09:00:00 <aggregated> <aggregated>        22         293
 3 2020-10-01 10:00:00 <aggregated> <aggregated>        34         270
 4 2020-10-01 11:00:00 <aggregated> <aggregated>        20         201
 5 2020-10-01 12:00:00 <aggregated> <aggregated>         3          22
 6 2020-10-01 13:00:00 <aggregated> <aggregated>        44         307
 7 2020-10-01 14:00:00 <aggregated> <aggregated>        28         239
 8 2020-10-01 15:00:00 <aggregated> <aggregated>        39         286
 9 2020-10-01 16:00:00 <aggregated> <aggregated>        44         312
10 2020-10-01 17:00:00 <aggregated> <aggregated>         4          20

At this point everything is nice and grouped by Department D, location M and appointment slot. Next I filter the data to remove the last 50 observations, equal to 5 days of appointment slots. I do this now so I can fit and model and use the last 5 days of visit counts in the appointment slots to forecast returns.

D_M_nsA1 <- D_M_nsA |>
  filter(Appt_Slot < max(ns_data_tsbl$Appt_Slot) - days(5))

Now I re-index the tsibble because any fourier terms wont play nicely with the gaps in non-working hours and days. The new index is based off of row numbers, and repeats per Department D and location M.

D_M_nsB <- D_M_nsA1 |>
  group_by(D, M) %>%
  mutate(t = row_number()) %>%
  update_tsibble(index = t, regular = TRUE)

Here I essentially try and use Hyndman's model with a regression term. At this point I am worried less about the accuracy of the model and more looking to see how it can work with grouped data. I plan on modifying the model later and testing different aggregation strategies.

dhr <- ARIMA(sqrt(returns) ~ 
               PDQ(0, 0, 0) + 
               pdq(d = 0) +
               visit_count +
               fourier(period = "day", K = 10) +
               fourier(period = "week", K = 5) +
               fourier(period = "year", K = 3))

fit <- D_M_nsB |>
  model(
    dhr
    ) |>
  reconcile(
    bu2 = bottom_up(dhr)
  ) 

Next I use the D_M_nsA tsibble from before, filtering to only include the last 50 observations per Department and location M. I get rid of the returns column to keep the visit count which is the scheduled visit count in the future. I add the re-indexed time series values using the maximum row count per unique group from before, and update the tsibble with the new index.

fit_data <- D_M_nsA |>
  filter(Appt_Slot >= max(ns_data_tsbl$Appt_Slot) - days(5)) |>
  select(-returns) |>
  group_by(D, M) |>
  mutate(t = row_number() + max(D_M_nsB$t)) |>
  update_tsibble(index = t, regular = TRUE)
fit_data
# A tsibble: 72 x 5 [1h] <UTC>
# Key:       D, M [72]
# Groups:    D, M [72]
   Appt_Slot           D            M            visit_count     t
   <dttm>              <chr*>       <chr*>             <int> <int>
 1 2023-09-29 17:00:00 <aggregated> <aggregated>           8  7820
 2 2023-09-29 17:00:00 A            <aggregated>           8  7820
 3 2023-09-29 17:00:00 B            <aggregated>           0  7820
 4 2023-09-29 17:00:00 C            <aggregated>           0  7820
 5 2023-09-29 17:00:00 <aggregated> NOR                    0  7820
 6 2023-09-29 17:00:00 <aggregated> POR                    0  7820
 7 2023-09-29 17:00:00 <aggregated> NYC                    0  7820
 8 2023-09-29 17:00:00 <aggregated> DAL                    0  7820
 9 2023-09-29 17:00:00 <aggregated> BVT                    4  7820
10 2023-09-29 17:00:00 <aggregated> SPR                    0  7820

I use fit_data as the new data for the forecasting model, which is nice because it will forecast values only for the days and times I have scheduled visits for.

fc <- fit |>
  forecast(new_data = fit_data)

fc
# A fable: 7,200 x 8 [1]
# Key:     D, M, .model [144]
   D M .model     t         returns .mean Appt_Slot           visit_count
   <chr*>    <chr*>   <chr>  <int>            <dist> <dbl> <dttm>                    <int>
 1 A        BTL      dhr     7820    t(N(0.6, 0.2)) 0.566 2023-09-25 08:00:00           6
 2 A        BTL      dhr     7821  t(N(0.76, 0.21)) 0.785 2023-09-25 09:00:00           8
 3 A        BTL      dhr     7822  t(N(0.78, 0.21)) 0.816 2023-09-25 10:00:00           8
 4 A        BTL      dhr     7823   t(N(0.5, 0.21)) 0.457 2023-09-25 11:00:00           5
 5 A        BTL      dhr     7824 t(N(0.032, 0.21)) 0.209 2023-09-25 12:00:00           0
 6 A        BTL      dhr     7825  t(N(0.87, 0.21)) 0.965 2023-09-25 13:00:00           9
 7 A        BTL      dhr     7826  t(N(0.87, 0.21)) 0.963 2023-09-25 14:00:00           9
 8 A        BTL      dhr     7827  t(N(0.87, 0.21)) 0.962 2023-09-25 15:00:00           9
 9 A        BTL      dhr     7828  t(N(0.49, 0.21)) 0.453 2023-09-25 16:00:00           5
10 A        BTL      dhr     7829 t(N(0.028, 0.21)) 0.209 2023-09-25 17:00:00           0

Now after all that work comes my issue. I want to graph the results of the forecasts of the week prior and the week of the forecasts. The fc mable has an index of t, which is not conducive to interpretation. I dont know how to update the index of mable, then fill this in with the gaps from non-working hours and weekends. If I use the below code to update the index, it changes the fable into a tsiblle and I lose the return forecasts and distributions.

fc_out <- fc |> update_tsibble(index = Appt_Slot) 

# Plot results with last 2 weeks of data
fc |>
  filter (is_aggregated(M), D=='A', .model=="dhr") |>
  fill_gaps() |>
  autoplot(D_M_nsA|> 
             filter(is_aggregated(M), D=='A') |>
             tail(days(14)) |>
             fill_gaps()) +
  labs(y = "returns",
       title = "Department A returns")

Returns an error: Don't know how to automatically pick scale for object of type <grouped_ts/grouped_df/tbl_ts/tbl_df/tbl/data.frame>. Defaulting to continuous. Error in geom_line(): ! Problem while computing aesthetics. ℹ Error occurred in the 1st layer. Caused by error in check_aesthetics(): ! Aesthetics must be either length 1 or the same as the data (106)

0

There are 0 answers