I already have a partial answer to the problem here, which I understand as far as it is explained: How to most efficiently restructure a character string for fasttime in data.table

However, the task has been extended, and needs to deal with a variation of the orginal formatting.

I have a large dataset, with a column of dates of character class in the form of:

01 Jan 2014

or:

dd MMM yyyy

Which I want to restructure to feed into fastPOSIXct which only accepts character input in POSIXct order:

yyyy-mm-dd

The above linked question notes that an efficient approach would be to use regex and then supply the output to fast.time. Here do I need to extend this to include a method to understand monthly abbreviations, convert to numeric, then rearrange? How would I do this? I know that there is a month.abb as a built in constant. Should I be using this, or is there a smarter way?

1

There are 1 answers

1
SabDeM On BEST ANSWER

What about using lubridate:

x <- "01 Jan 2014"
x
[1] "01 Jan 2014"
library(lubridate)
dmy(x)
[1] "2014-01-01 UTC"

Of course lubridate functions accept tz argument too. To see a complete list of acceptable arguments see OlsonNames()

Benchmark

I decided to update this answer with some empirical data using the micro benchmark package and the lubridate option for use fasstime.

library(micro benchmark)
microbenchmark(dmy(x), times = 10000)
Unit: milliseconds
   expr      min      lq     mean   median      uq     max neval
 dmy(x) 1.992639 2.02567 2.142212 2.041514 2.07153 39.1384 10000

options(lubridate.fasttime = T)

microbenchmark(dmy(x), times = 10000)
Unit: milliseconds
   expr      min      lq     mean   median       uq      max neval
 dmy(x) 1.993326 2.02488 2.136748 2.039467 2.065326 163.2008 10000