I am importing a series of csv files into R. These contain a date/time column, id, and two columns of temperature values.
This would give an example of what the data looks like:
id<-c(1,2,3,4)
date.time<-as.character(c("12/03/17 00:21:28", "12/03/17 02:21:28", "12/03/17 04:21:28", "12/03/17 06:21:28"))
temp1<-c(-3.568,-3.568,-3.598,-3.598)
temp2<-c(-11.577,-11.577,-11.541,-11.433)
df<-data.frame(id,date.time,temp1,temp2)
Because the date/time are not in the format I want, I've been using strptime and formatting them as POSIXlt.
Like:
df$date.time<-strptime(df$date.time, "%d/%m/%y %H:%M:%S")
df$date.time<- as.POSIXlt(df$date.time, "%Y/%m/%d %H:%M:%S", tz="GMT0")
This works fine and gives data which looks like:
id date.time temp1 temp2
1 2017-03-12 0:21:28 -3.568 -11.577
2 2017-03-12 2:21:28 -3.568 -11.577
3 2017-03-12 4:21:28 -3.598 -11.541
4 2017-03-12 6:21:28 -3.598 -11.433
However, I then want to round the time part of the date.time column to the nearest hour. I've been using:
df$date.time<-round(df$date.time, units="hours")
This works just fine 99% of the time. However, in some of the files, R is deleting date.time values, seemingly at random, and giving an NA. Only one or two values in each file is being deleted and I can't see any reason these particular values would be deleted. For example:
id date.time temp1 temp2
1 2017-03-12 0:00:00 -3.568 -11.577
2 NA -3.568 -11.577
3 2017-03-12 4:00:00 -3.598 -11.541
4 2017-03-12 6:00:00 -3.598 -11.433
From what I've read, date/time values can be finicky but this just seems oddly random.
Does anyone know what might be causing this issue and if there is a better way to round the time part of POSIXlt values?
Update: It seems the only times which are deleted are those at 2am on March 12th. So, many of the times which should be rounded to 2017-03-12 02:00:00 are being replaced with NAs. But this does not happen with all of the csv files, only about half. Why is R having problems reading this particular date?
Thanks!
Not adding a timezone for
strptime
is corrupting your string.Lets look at the
head
ofdf$date.time
. Missing/different time zones.By adding
tz="GMT0"
to yourstrptime
function you should get desired result.