I am dealing with the well known issue of daylight saving and this might be a repetition. I could just find this which is not really helpfull with my issue. My distinct issue is that I have a data gap at the second appearance of the time step "2015-10-25 02:00:00"
. This seems to be the problem and leads to the result that the switch to CET happens one hour later at "2015-10-25 03:00:00"
Doing the example data it seams to me that sometimes restarting the R session is giving different results...
Testdata on a fresh session:
Sys.setenv(TZ="Europe/Berlin")
ts1 <- c(seq(as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin"), by = 600, length.out = 12 ))
ts2 <- c(seq(as.POSIXct("2015-10-25 01:40:00", tz="Europe/Berlin"), by = 1200, length.out = 9 ))
ts3 <- ts2[c(1,2,3,4,6,7,8,9)]
Is giving:
> ts1
[1] "2015-10-25 02:00:00 CET" "2015-10-25 02:10:00 CET" "2015-10-25 02:20:00 CET"
[4] "2015-10-25 02:30:00 CET" "2015-10-25 02:40:00 CET" "2015-10-25 02:50:00 CET"
[7] "2015-10-25 03:00:00 CET" "2015-10-25 03:10:00 CET" "2015-10-25 03:20:00 CET"
[10] "2015-10-25 03:30:00 CET" "2015-10-25 03:40:00 CET" "2015-10-25 03:50:00 CET"
> ts2
[1] "2015-10-25 01:40:00 CEST" "2015-10-25 02:00:00 CEST" "2015-10-25 02:20:00 CEST"
[4] "2015-10-25 02:40:00 CEST" "2015-10-25 02:00:00 CET" "2015-10-25 02:20:00 CET"
[7] "2015-10-25 02:40:00 CET" "2015-10-25 03:00:00 CET" "2015-10-25 03:20:00 CET"
> ts3
[1] "2015-10-25 01:40:00 CEST" "2015-10-25 02:00:00 CEST" "2015-10-25 02:20:00 CEST"
[4] "2015-10-25 02:40:00 CEST" "2015-10-25 02:20:00 CET" "2015-10-25 02:40:00 CET"
[7] "2015-10-25 03:00:00 CET" "2015-10-25 03:20:00 CET"
After that run again
ts1 <- c(seq(as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin"), by = 600, length.out = 12 ))
To get
> ts1
[1] "2015-10-25 02:00:00 CEST" "2015-10-25 02:10:00 CEST" "2015-10-25 02:20:00 CEST"
[4] "2015-10-25 02:30:00 CEST" "2015-10-25 02:40:00 CEST" "2015-10-25 02:50:00 CEST"
[7] "2015-10-25 02:00:00 CET" "2015-10-25 02:10:00 CET" "2015-10-25 02:20:00 CET"
[10] "2015-10-25 02:30:00 CET" "2015-10-25 02:40:00 CET" "2015-10-25 02:50:00 CET"
And finally, this
ts4 = c(as.POSIXct("2015-10-25 01:40:00", tz="Europe/Berlin"),
as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin"),
as.POSIXct("2015-10-25 02:20:00", tz="Europe/Berlin"),
as.POSIXct("2015-10-25 02:40:00", tz="Europe/Berlin"),
as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin"),
as.POSIXct("2015-10-25 02:20:00", tz="Europe/Berlin"),
as.POSIXct("2015-10-25 02:40:00", tz="Europe/Berlin"),
as.POSIXct("2015-10-25 03:00:00", tz="Europe/Berlin"),
as.POSIXct("2015-10-25 03:20:00", tz="Europe/Berlin"))
leads to that
> ts4
[1] "2015-10-25 01:40:00 CEST" "2015-10-25 02:00:00 CEST" "2015-10-25 02:20:00 CEST"
[4] "2015-10-25 02:40:00 CEST" "2015-10-25 02:00:00 CEST" "2015-10-25 02:20:00 CEST"
[7] "2015-10-25 02:40:00 CEST" "2015-10-25 03:00:00 CET" "2015-10-25 03:20:00 CET"
As you can see, the second run of ts1
is giving the correct DST. This makes reproducibility a mess. Acctually just ts4
is kind of reproducing my issue. But in my real data, I have the data gab at "2015-10-25 02:00:00"
the second which might not be the problem here... I hope this is well known and somebody has a solution. I would appreciate a base R solution.
edit:
So this seems to be the core of the problem:
Restarting R session...
> as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin")
[1] "2015-10-25 02:00:00 CET"
> as.POSIXct("2015-10-25 01:40:00", tz="Europe/Berlin")
[1] "2015-10-25 01:40:00 CEST"
> as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin")
[1] "2015-10-25 02:00:00 CEST"
> rm(list = ls())
> as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin")
[1] "2015-10-25 02:00:00 CEST"
> as.POSIXct("2015-10-25 03:00:00", tz="Europe/Berlin")
[1] "2015-10-25 03:00:00 CET"
> as.POSIXct("2015-10-25 02:00:00", tz="Europe/Berlin")
[1] "2015-10-25 02:00:00 CET"
- What is the routine behind
as.POSIXct
and where do i find it? - Where is the information stored about R decides whether 2:00 is CET or CEST?
- Any ideas why this could fail on a long time series? With regards to the identical time series ts2 (defined as sequence -> correct DST) and ts4 (single components vector -> incorrect DST)