dplyr - mutate_each - colswise coercion to POSIXlt fails

1k views Asked by At

I recently came across dplyr and - as a newbie - like it very much. Hence, I try to convert some of my base-R code into dplyr-code.

Working with air traffic control data, I am struggling with coercing timestamps using lubridate and as.POSIXlt to parse timestamps embedded in a mutate_each() call. I need the POSIXlt format as I have to work with local times (at different locations) later on. Reading in the data delivers a data frame of characters. The following is a simplistic example:

ICAO_ADEP <- c("DGAA","ZSPD","UAAA","RJTT","KJFK","WSSS")
MVT_TIME_UTC <- c("01-Jan-2013 04:02:24", NA,"01-Jan-2013 04:08:18", NA,"01-Jan-2013 04:17:11","01-Jan-2013 04:21:52")
flights <- data.frame(ICAO_ADEP, MVT_TIME_UTC)

The function I wrote reads as follows:

make_POSIXlt <- function(vec, tz="UTC"){
vec <- parse_date_time(vec, orders="dmy_hms", tz=tz)
vec <- as.POSIXlt(vec, tz=tz)
}

The code works fine when executed with a single column:

flights$MVT_TIME_UTC <- make_POSIXlt(flights$MVT_TIME_UTC)

If I run the following dplyr code the function fails:

flights$BLOCK_TIME_UTC <- mutate_each(flights, funs(make_POSIXlt(.)), MVT_TIME_UTC)
Error: wrong result size (9), expected 6 or 1

The issue should be linked with the as.POSIXlt call. If this line is commented out the code works within mutate_each and coerces the timestamp into POSIXct.

Any idea/help on what is wrong? Obviously, my data has several timestamps that I would like to coerce with mutate_each (or any other suitable dplyr function) ...

1

There are 1 answers

0
Ray On BEST ANSWER

Revisiting my question about 4 years later, I realised that I forgot to mark it as answered. However, this also gives me the chance to document how this (relatively) simple type coercion can (meanwhile) elegantly solved with dplyr and lubridate.

Key lesson learned:

  1. never use POSIXlt with a data frame (and its later brother tibble, although you can now work with list columns).
  2. coerce date-timestamps with the helpful parser functions from the lubridate package.

For the example from above

ICAO_ADEP <- c("DGAA","ZSPD","UAAA","RJTT","KJFK","WSSS")
MVT_TIME_UTC <- c("01-Jan-2013 04:02:24", NA,"01-Jan-2013 04:08:18", NA,"01-Jan-2013   04:17:11","01-Jan-2013 04:21:52")
flights <- data.frame(ICAO_ADEP, MVT_TIME_UTC)

flights <- flights %>% mutate(MVT_TIME_UTC = lubridate::dmy_hms(MVT_TIME_UTC)

will coerce the timestamps in MVT_TIME_UTC. Check the documentation on lubridate for other parsers and/or how to handle local time zones.