In R, is the %OSn time format only valid for formatting, but not parsing?

1.5k views Asked by At

Consider this R code, which uses a defined time format string (the timeFormat variable below) to format and parse dates:


time = as.POSIXct(1433867059, origin = "1970-01-01")
print(time)
print( as.numeric(time) )

timeFormat = "%Y-%m-%d %H:%M:%OS3"
tz = "EST"

timestamp = format(time, format = timeFormat, tz = tz)
print(timestamp)

timeParsed = as.POSIXct(timestamp, format = timeFormat, tz = tz)
print(timeParsed)
print( as.numeric(timeParsed) )

If I paste that into Rgui on my Windows box, which is running the latest (3.2.0) stable release, I get this:


> time = as.POSIXct(1433867059, origin = "1970-01-01")
> print(time)
[1] "2015-06-09 12:24:19 EDT"
> print( as.numeric(time) )
[1] 1433867059
> 
> timeFormat = "%Y-%m-%d %H:%M:%OS3"
> tz = "EST"
> 
> timestamp = format(time, format = timeFormat, tz = tz)
> print(timestamp)
[1] "2015-06-09 11:24:19.000"
> 
> timeParsed = as.POSIXct(timestamp, format = timeFormat, tz = tz)
> print(timeParsed)
[1] NA
> print( as.numeric(timeParsed) )
[1] NA

Notice how the time format, which ends with %OS3, produces the correct time stamp (a 3 digit millisecond resolution).

However, that same time format cannot parse that time stamp back into the original POSIXct value; it barfs and parses NA.

Anyone know what is going on?

A web search found this stackoverflow link, where one of the commenters, Waldir Leoncio, in the first answer, appears to describe the same parsing bug with %OS3 that I do:

"use, for example, strptime(y, "%d.%m.%Y %H:%M:%OS3"), but it doesn't work for me. Henrik noted that the function's help page, ?strptime states that the %OS3 bit is OS-dependent. I'm using an updated Ubuntu 13.04 and using %OS3 yields NA."

The help page mentioned in the quote above likely is this link, which is unfortunately terse, merely saying

"Specific to R is %OSn, which for output gives the seconds truncated to 0 <= n <= 6 decimal places (and if %OS is not followed by a digit, it uses the setting of getOption("digits.secs"), or if that is unset, n = 3). Further, for strptime %OS will input seconds including fractional seconds. Note that %S ignores (and not rounds) fractional parts on output."

That final senetence about strptime (i.e. parsing) is subtle: it says "for strptime %OS". Note the absence of an 'n': it says %OS instead of %OSn.

Does that mean that %OSn can NOT be used for parsing, only for formatting?

That is what I have empirically found, but is it expected behavior or a bug?

Very annoying if expected behavior, since that means that I need different time formats for formatting and parsing. Have never seen that before in any other language's date API...

(Aside: I am aware that there is another issue, even if you just want to format, with %OSn: R truncates fractional parts instead of rounds. For those not aware of this bad behavior, its hazards are discussed here, here, and here.)

1

There are 1 answers

1
Joshua Ulrich On BEST ANSWER

This is expected behavior, not a bug. "%OSn" is for output. "%OS" is for input, and includes fractional seconds, as it says in your second blockquote:

Further, for strptime %OS will input seconds including fractional seconds.

options(digits.secs=6)
as.POSIXct("2015-06-09 11:24:19.002", "America/New_York", "%Y-%m-%d %H:%M:%OS")
# [1] "2015-06-09 11:24:19.002 EDT"

Also note that "EST" is an ambiguous timezone, and probably not what you expect. See the Time zone names section of ?timezone.