I am getting two different output when using the same lubridate functions and I am not sure why

29 views Asked by At

I am new to R, and just started to use to the library(lubridate) package. I wanted to convert the dataframe formats, sleep and intensity to the same format and convert them from char to Date format.

The original dataframes,

> head(sleep):

          Id              SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
1 1503960366 4/12/2016 12:00:00 AM                 1                327            346
2 1503960366 4/13/2016 12:00:00 AM                 2                384            407
3 1503960366 4/15/2016 12:00:00 AM                 1                412            442
4 1503960366 4/16/2016 12:00:00 AM                 2                340            367
5 1503960366 4/17/2016 12:00:00 AM                 1                700            712
6 1503960366 4/19/2016 12:00:00 AM                 1                304            320

> head(intensity):

          Id          ActivityHour TotalIntensity AverageIntensity
1 1503960366 4/12/2016 12:00:00 AM             20         0.333333
2 1503960366  4/12/2016 1:00:00 AM              8         0.133333
3 1503960366  4/12/2016 2:00:00 AM              7         0.116667
4 1503960366  4/12/2016 3:00:00 AM              0         0.000000
5 1503960366  4/12/2016 4:00:00 AM              0         0.000000
6 1503960366  4/12/2016 5:00:00 AM              0         0.000000

I ran the lubridate functions:

sleep$SleepDay=as.POSIXct(sleep$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
intensity$ActivityHour=as.POSIXct(intensity$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())

The output:

> head(sleep)
          Id   SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed     date
1 1503960366 2016-04-12                 1                327            346 04/12/16
2 1503960366 2016-04-13                 2                384            407 04/13/16
3 1503960366 2016-04-15                 1                412            442 04/15/16
4 1503960366 2016-04-16                 2                340            367 04/16/16
5 1503960366 2016-04-17                 1                700            712 04/17/16
6 1503960366 2016-04-19                 1                304            320 04/19/16

> head(intensity)
          Id        ActivityHour TotalIntensity AverageIntensity
1 1503960366 2016-04-12 00:00:00             20         0.333333
2 1503960366 2016-04-12 01:00:00              8         0.133333
3 1503960366 2016-04-12 02:00:00              7         0.116667
4 1503960366 2016-04-12 03:00:00              0         0.000000
5 1503960366 2016-04-12 04:00:00              0         0.000000
6 1503960366 2016-04-12 05:00:00              0         0.000000

Why is it that the time is specified for the intensity dataframe but not for SleepDay? Did it remove it? The code seems identical...

Moreover, in the code it is the format is specified as "%m/%d/%Y %I:%M:%S %p" but the dataframe shows it as inverted %Y/%m/%d, why?. I know these questions may seem amaturish, but I want to understand the code in detail.

Thank you.

I tried to convery the date format, it worked but I want to understand the process.

2

There are 2 answers

0
r2evans On

R is just being slick: if all timestamps in a vector/column are effectively "midnight", then R hides the time component from what is rendered to the screen. The underlying object is still perfectly the same, it's a full timestamp.

Demonstration:

tm <- as.POSIXct("4/12/2016 12:00:00 AM", format="%m/%d/%Y %I:%M:%S %p")
tm + rep(0, 4)
# [1] "2016-04-12 EDT" "2016-04-12 EDT" "2016-04-12 EDT" "2016-04-12 EDT"
tm + 0:3
# [1] "2016-04-12 00:00:00 EDT" "2016-04-12 00:00:01 EDT" "2016-04-12 00:00:02 EDT" "2016-04-12 00:00:03 EDT"

Underneath, they're all just numbers with attributes:

dput(tm + 0:4)
# structure(c(1460433600, 1460433601, 1460433602, 1460433603, 1460433604
# ), class = c("POSIXct", "POSIXt"))
0
Adriano Mello On

A tibble::tibble will help you with this visualisation:

# tidyverse
tm <- as.POSIXct("4/12/2016 12:00:00 AM", format="%m/%d/%Y %I:%M:%S %p")
tm <- tm + lubridate::days(1:4)

# Vector
> tm
[1] "2016-04-13 -03" "2016-04-14 -03" "2016-04-15 -03" "2016-04-16 -03"

# Matrix
> matrix(data = tm, ncol = 1)
           [,1]
[1,] 1460516400
[2,] 1460602800
[3,] 1460689200
[4,] 1460775600

# List
> list(tm)
[[1]]
[1] "2016-04-13 -03" "2016-04-14 -03" "2016-04-15 -03" "2016-04-16 -03"

# Data frame
> data.frame(date_time = tm)
   date_time
1 2016-04-13
2 2016-04-14
3 2016-04-15
4 2016-04-16

# Tibble
> tibble(date_time = tm)
# A tibble: 4 × 1
  date_time          
  <dttm>             
1 2016-04-13 00:00:00
2 2016-04-14 00:00:00
3 2016-04-15 00:00:00
4 2016-04-16 00:00:00