CET/CEST daylight saving detection routine in base R

648 views Asked by At

I am dealing with the well known issue of daylight saving and this might be a repetition. I could just find this which is not really helpfull with my issue. My distinct issue is that I have a data gap at the second appearance of the time step "2015-10-25 02:00:00". This seems to be the problem and leads to the result that the switch to CET happens one hour later at "2015-10-25 03:00:00"

Doing the example data it seams to me that sometimes restarting the R session is giving different results...

Testdata on a fresh session:

Sys.setenv(TZ="Europe/Berlin")

ts1 <- c(seq(as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin"), by = 600, length.out = 12 ))
ts2 <- c(seq(as.POSIXct("2015-10-25 01:40:00", tz="Europe/Berlin"), by = 1200, length.out = 9 ))
ts3 <- ts2[c(1,2,3,4,6,7,8,9)]

Is giving:

> ts1
 [1] "2015-10-25 02:00:00 CET" "2015-10-25 02:10:00 CET" "2015-10-25 02:20:00 CET"
 [4] "2015-10-25 02:30:00 CET" "2015-10-25 02:40:00 CET" "2015-10-25 02:50:00 CET"
 [7] "2015-10-25 03:00:00 CET" "2015-10-25 03:10:00 CET" "2015-10-25 03:20:00 CET"
 [10] "2015-10-25 03:30:00 CET" "2015-10-25 03:40:00 CET" "2015-10-25 03:50:00 CET"

> ts2
 [1] "2015-10-25 01:40:00 CEST" "2015-10-25 02:00:00 CEST" "2015-10-25 02:20:00 CEST"
 [4] "2015-10-25 02:40:00 CEST" "2015-10-25 02:00:00 CET"  "2015-10-25 02:20:00 CET" 
 [7] "2015-10-25 02:40:00 CET"  "2015-10-25 03:00:00 CET"  "2015-10-25 03:20:00 CET" 

> ts3
 [1] "2015-10-25 01:40:00 CEST" "2015-10-25 02:00:00 CEST" "2015-10-25 02:20:00 CEST"
 [4] "2015-10-25 02:40:00 CEST" "2015-10-25 02:20:00 CET"  "2015-10-25 02:40:00 CET" 
 [7] "2015-10-25 03:00:00 CET"  "2015-10-25 03:20:00 CET" 

After that run again

ts1 <- c(seq(as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin"), by = 600, length.out = 12 ))

To get

> ts1
 [1] "2015-10-25 02:00:00 CEST" "2015-10-25 02:10:00 CEST" "2015-10-25 02:20:00 CEST"
 [4] "2015-10-25 02:30:00 CEST" "2015-10-25 02:40:00 CEST" "2015-10-25 02:50:00 CEST"
 [7] "2015-10-25 02:00:00 CET"  "2015-10-25 02:10:00 CET"  "2015-10-25 02:20:00 CET" 
 [10] "2015-10-25 02:30:00 CET"  "2015-10-25 02:40:00 CET"  "2015-10-25 02:50:00 CET" 

And finally, this

ts4 = c(as.POSIXct("2015-10-25 01:40:00", tz="Europe/Berlin"),
        as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin"),
        as.POSIXct("2015-10-25 02:20:00", tz="Europe/Berlin"),
        as.POSIXct("2015-10-25 02:40:00", tz="Europe/Berlin"),
        as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin"),
        as.POSIXct("2015-10-25 02:20:00", tz="Europe/Berlin"),
        as.POSIXct("2015-10-25 02:40:00", tz="Europe/Berlin"),
        as.POSIXct("2015-10-25 03:00:00", tz="Europe/Berlin"),
        as.POSIXct("2015-10-25 03:20:00", tz="Europe/Berlin"))

leads to that

> ts4
 [1] "2015-10-25 01:40:00 CEST" "2015-10-25 02:00:00 CEST" "2015-10-25 02:20:00 CEST"
 [4] "2015-10-25 02:40:00 CEST" "2015-10-25 02:00:00 CEST" "2015-10-25 02:20:00 CEST"
 [7] "2015-10-25 02:40:00 CEST" "2015-10-25 03:00:00 CET"  "2015-10-25 03:20:00 CET" 

As you can see, the second run of ts1 is giving the correct DST. This makes reproducibility a mess. Acctually just ts4 is kind of reproducing my issue. But in my real data, I have the data gab at "2015-10-25 02:00:00" the second which might not be the problem here... I hope this is well known and somebody has a solution. I would appreciate a base R solution.

edit:

So this seems to be the core of the problem:

Restarting R session...

> as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin")
[1] "2015-10-25 02:00:00 CET"
> as.POSIXct("2015-10-25 01:40:00", tz="Europe/Berlin")
[1] "2015-10-25 01:40:00 CEST"
> as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin")
[1] "2015-10-25 02:00:00 CEST"
> rm(list = ls())
> as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin")
[1] "2015-10-25 02:00:00 CEST"
> as.POSIXct("2015-10-25 03:00:00", tz="Europe/Berlin")
[1] "2015-10-25 03:00:00 CET"
> as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin")
[1] "2015-10-25 02:00:00 CET"
  • What is the routine behind as.POSIXct and where do i find it?
  • Where is the information stored about R decides whether 2:00 is CET or CEST?
  • Any ideas why this could fail on a long time series? With regards to the identical time series ts2 (defined as sequence -> correct DST) and ts4 (single components vector -> incorrect DST)
0

There are 0 answers