Problem converting time series df from chr to date using as.POSIXct

30 views Asked by At

I'm importing a large high resolution time series hydrometric dataset from government website as a csv file. The column 'x.Timestamp' is importing as chr with some unusual characters but when I try to convert it from chr to date, NA is returned. I would be really grateful for any help.

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
OW_HR_Discharge_DataV2<- data.frame(
  stringsAsFactors = FALSE,
  X.Timestamp = c("1 1955-11-08T00:00:00.000Z",
                  "2 1955-11-08T00:15:00.000Z","3 1955-11-08T00:30:00.000Z",
                  "4 1955-11-08T00:45:00.000Z",
                  "5 1955-11-08T01:00:00.000Z","6 1955-11-08T01:15:00.000Z"),
  Value = c(10.227, 10.226, 10.228, 10.227, 10.227, 10.227),
  Quality.Code = c(31L, 31L, 31L, 31L, 31L, 31L)
)
str(OW_HR_Discharge_DataV2)
#> 'data.frame':    6 obs. of  3 variables:
#>  $ X.Timestamp : chr  "1 1955-11-08T00:00:00.000Z" "2 1955-11-08T00:15:00.000Z" "3 1955-11-08T00:30:00.000Z" "4 1955-11-08T00:45:00.000Z" ...
#>  $ Value       : num  10.2 10.2 10.2 10.2 10.2 ...
#>  $ Quality.Code: int  31 31 31 31 31 31
OW_HR_Discharge_DataV2$X.Timestamp<-as.POSIXct(OW_HR_Discharge_DataV2$X.Timestamp, format = "%Y-%m-%d %H:%M:%S", tz="GMT")
str(OW_HR_Discharge_DataV2)
#> 'data.frame':    6 obs. of  3 variables:
#>  $ X.Timestamp : POSIXct, format: NA NA ...
#>  $ Value       : num  10.2 10.2 10.2 10.2 10.2 
...
#>  $ Quality.Code: int  31 31 31 31 31 31
2

There are 2 answers

0
Adriano Mello On BEST ANSWER

As @Sophie said with other words, your csv's row-numbers appears to be at the beginning of OW_HR_Discharge_DataV2. Remove it with stringr::str_remove() and parse the timestamp with lubridate::ymd_hms():

library(tidyverse)

# ------------------
OW_HR_Discharge_DataV2 <- OW_HR_Discharge_DataV2 %>% 
  as_tibble() %>% 
  mutate(
    new_timestamp = X.Timestamp %>% 
      str_remove("^\\d+\\s") %>% 
      ymd_hms(tz = "GMT"))

# Output ----------
> OW_HR_Discharge_DataV2

# A tibble: 6 × 4
  X.Timestamp                Value Quality.Code new_timestamp      
  <chr>                      <dbl>        <int> <dttm>             
1 1 1955-11-08T00:00:00.000Z  10.2           31 1955-11-08 00:00:00
2 2 1955-11-08T00:15:00.000Z  10.2           31 1955-11-08 00:15:00
3 3 1955-11-08T00:30:00.000Z  10.2           31 1955-11-08 00:30:00
4 4 1955-11-08T00:45:00.000Z  10.2           31 1955-11-08 00:45:00
5 5 1955-11-08T01:00:00.000Z  10.2           31 1955-11-08 01:00:00
6 6 1955-11-08T01:15:00.000Z  10.2           31 1955-11-08 01:15:00

Note1: You don't really need to make your data frame as_tibble() but it makes sense in a tidyverse solution.

Note2: when working with dates, make sure your timezone is in base::OlsonNames() ("GMT" is) and avoid problems.

Hope it helps.

1
Sophie On

Your X.Timestamp does not contain the format passed in parameter to as.POSIXct. in "1 1955-11-08T00:00:00.000Z", there is a superfluous "1" at the beginning. This code works:

OW_HR_Discharge_DataV2$X.Timestamp2 <- as.POSIXct(sapply(strsplit(OW_HR_Discharge_DataV2$X.Timestamp, " "),
                                                         function (x) x[[2]]),
                                                  tz = "GMT")
OW_HR_Discharge_DataV2