Does fabletools::skill_score respect transformations of the target variable?

Question

Does fabletools::skill_score respect transformations of the target variable?

32 views Asked by Conor At 30 October 2023 at 20:08

While testing the accuracy of some models using fable, I found an interesting behavior with fabletools::skill_score. skill_score is described in the FPP3 book. If you calculate the test accuracy of a set of models that include a NAIVE/SNAIVE model with skill_score(CRPS) with no transformation of the target variable, the NAIVE/SNAIVE model has a skill_score of 0. This aligns with the description in the FPP3 book:

the proportion that the ... method improves over the naïve method based on CRPS

However, if you transform the target variable somehow (ex. log(x + 1)), the NAIVE/SNAIVE model does not have a skill_score of 0. This indicates to me that the skill_score function might not be honoring the transformation of the target variable. I looked at the source code and did not see any reference to transformations.

Is this the expected behavior of skill_score? If so, is there a way to carry the transformation over to skill_score? Or is skill_score not appropriate for models with transformed target variables?

This code replicates the expected behavior of skill_score on untransformed data:

library(fpp3)

google_stock <- gafa_stock |>
  filter(Symbol == "GOOG", year(Date) >= 2015) |>
  mutate(day = row_number()) |>
  update_tsibble(index = day, regular = TRUE)

google_stock |> 
  autoplot()

test <- google_stock |> 
  slice_tail(prop = .8)

train <- google_stock |> 
  anti_join(test)

fitted_model <- train |> 
  model(
    Mean = MEAN(Close),
    `Naïve` = NAIVE(Close),
    Drift = NAIVE(Close ~ drift())
  )

goog_fc <- fitted_model |> 
  forecast(h = 12)

fc_acc <- goog_fc |> 
  accuracy(google_stock,
           measures = list(point_accuracy_measures, distribution_accuracy_measures, crps_skill = skill_score(CRPS))) |> 
  select(.model, .type, CRPS, crps_skill, RMSSE)

fc_acc
# A tibble: 3 × 5
  .model .type  CRPS crps_skill RMSSE
  <chr>  <chr> <dbl>      <dbl> <dbl>
1 Drift  Test   38.2     0.0955  5.09
2 Mean   Test  109.     -1.59   12.6 
3 Naïve  Test   42.2     0       5.49

This replicates the unexpected behavior with the same data transformed with log(x + 1):

fitted_model_transformed <- train |> 
  model(
    Mean = MEAN(log(Close + 1)),
    `Naïve` = NAIVE(log(Close + 1)),
    Drift = NAIVE(log(Close + 1) ~ drift())
  )

goog_fc_transformed <- fitted_model_transformed |> 
  forecast(h = 12)

fc_acc_transformed <- goog_fc_transformed |> 
  accuracy(google_stock,
           measures = list(point_accuracy_measures, distribution_accuracy_measures, crps_skill = skill_score(CRPS))) |> 
  select(.model, .type, CRPS, crps_skill, RMSSE)

fc_acc_transformed
# A tibble: 3 × 5
  .model .type  CRPS crps_skill RMSSE
  <chr>  <chr> <dbl>      <dbl> <dbl>
1 Drift  Test   36.3     0.140   4.97
2 Mean   Test  110.     -1.61   12.6 
3 Naïve  Test   40.8     0.0353  5.42

I would expect the Naïve model crps_skill to be 0, because it cannot improve on itself.

> sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] fable_0.3.3       feasts_0.3.1      fabletools_0.3.4  tsibbledata_0.4.1 tsibble_1.1.3     ggplot2_3.4.3     lubridate_1.9.2  
 [8] tidyr_1.3.0       dplyr_1.1.3       tibble_3.2.1      fpp3_0.5         

loaded via a namespace (and not attached):
 [1] rappdirs_0.3.3       plotly_4.10.2        utf8_1.2.4           generics_0.1.3       anytime_0.3.9        digest_0.6.33       
 [7] magrittr_2.0.3       grid_4.3.1           timechange_0.2.0     pkgload_1.3.2.1      fastmap_1.1.1        jsonlite_1.8.7      
[13] modeldata_1.2.0      httr_1.4.7           purrr_1.0.2          fansi_1.0.5          viridisLite_0.4.2    scales_1.2.1        
[19] numDeriv_2016.8-1.1  textshaping_0.3.6    lazyeval_0.2.2       cli_3.6.1            rlang_1.1.1          crayon_1.5.2        
[25] ellipsis_0.3.2       munsell_0.5.0        withr_2.5.1          tools_4.3.1          colorspace_2.1-0     vctrs_0.6.4         
[31] R6_2.5.1             lifecycle_1.0.3      htmlwidgets_1.6.2    ragg_1.2.5           pkgconfig_2.0.3      progressr_0.14.0    
[37] pillar_1.9.0         gtable_0.3.4         rsconnect_1.1.0      data.table_1.14.8    glue_1.6.2           Rcpp_1.0.11         
[43] systemfonts_1.0.4    tidyselect_1.2.0     rstudioapi_0.15.0    farver_2.1.1         htmltools_0.5.6      labeling_0.4.3      
[49] compiler_4.3.1       distributional_0.3.2

Original Q&A

There are 1 answers

**Rob Hyndman** · Accepted Answer · 2023-10-30T22:08:29+00:00

You can use several different transformations in the same model() call, so it makes no sense for skill_score() to use a benchmark model with anything other than no transformation. Otherwise, the scores for different models could use different benchmarks. Consequently, the benchmark Naive method must use an untransformed variable.

TechQA.

Does fabletools::skill_score respect transformations of the target variable?

There are 1 answers

Related Questions in R

Related Questions in TIME-SERIES

Related Questions in FORECASTING

Related Questions in FABLE-R

Popular Questions

Popular Tags

Trending Questions