R - Convert various csv numeric columns to date

41 views Asked by At

I have a csv datasheet with 7 columns filled with numeric values. 3 of these columns represent the date of the measurements: "YYYY", "MM", "DD", followed by 4 columns of relevant corresponding data: "qobs", "ckhs", "qceq", "qcol".

How do I convert the three first columns filled with numeric values into a date-datatype, while maintaining the dependency of the dates to the corresponding date?

#   YYYY, MM, DD, qobs, ckhs, qceq, qcol
# 1 1981, 1, 1, 7.136, 0, 0, 0
# 2 1981, 1, 2, 6.76, 0, 0, 0
# 3 1981, 1, 3, 10.886, 0, 0, 0
# ...

I looked online and only found solutions using the as.Date function that correspond to a single character string. I'm fairly new to programming and have only used R for a couple of days, so an elementary explanation would be greatly appreciated.

3

There are 3 answers

0
Adriano Mello On BEST ANSWER

There's a specific lubridate function for this:

# Toy data
aux <- tibble::tribble(
  ~YYYY, ~MM, ~DD,  ~qobs, ~ckhs, ~qceq, ~qcol,
  1981,   1,  1L,  7.136,     0,     0,     0,
  1981,   1,  2L,   6.76,     0,     0,     0,
  1981,   1,  3L, 10.886,     0,     0,     0)

# lubridate::make_date
aux <- mutate(aux, date = make_date(year = YYYY, month = MM, day = DD))

# Output
> aux
# A tibble: 3 × 8
   YYYY    MM    DD  qobs  ckhs  qceq  qcol date      
  <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <date>    
1  1981     1     1  7.14     0     0     0 1981-01-01
2  1981     1     2  6.76     0     0     0 1981-01-02
3  1981     1     3 10.9      0     0     0 1981-01-03
0
Captain Hat On

A tiydverse solution:

library(vroom)
library(dplyr)
library(lubridate) # a truly wonderful package for this kind of thing

df <- vroom("path-to-your-file.csv"
            col_types = "iiidddd")

df <-
  mutate(
    df, 

    date = make_date(YYYY, MM, DD)

   .keep = "unused", # drop the columns used for computation
   .before = qobs
   )

Explanation

vroom::vroom() is a really useful (and really fast!) function for reading plaintext data into R. It guesses the delimiter from the data and is generally pretty easy to implement.

dplyr::mutate() is a staple of tidyverse data manipulation. It computes new columns within dataframes, or modifies existing columns by overwriting them with new values. Here, we are computing a new column called date using lubridate::make_date(), which does what it says on the tin.

We also specify some of mutate()'s named arguments:

  • .keep = "unused" lets us automatically drop all of the columns we used to calculate our new variable, because we no longer need the YYYY, MM or DD columns
  • .before = qobs just makes our new date column appear in front of qobs, on the left-hand-side of our dataframe.

Edit: I was previously implementing the convoluted:

paste(YYYY, MM, DD, sep = ",") |>
lubridate::ymd()

Thanks to Adriano for showing me that make_date() exists!

1
ihecker On

You can create a date format variable using the YYYY, MM, and DD variables together. Below are two solutions, one using base R and the other using the lubridate package.

df <- data.frame(
  YYYY = c(1981, 1981, 1981),
  MM = c(1, 1, 1),
  DD = c(1, 2, 3),
  qobs = c(7.136, 6.76, 10.886),
  ckhs = c(0, 0, 0),
  qceq = c(0, 0, 0),
  qcol = c(0, 0, 0)
)

# Using as.Date()
df$asDate <- as.Date(paste(df$YYYY, df$MM, df$DD, sep = "-"), format = "%Y-%m-%d")

# Using ymd()
library(lubridate)
df$ymd <- ymd(paste(df$YYYY, df$MM, df$DD, sep = "-"))

From the variable you have created, you can retrieve back the information for the year, month, day and other (see lubridate documentation).

year(df$ymd)
#> [1] 1981 1981 1981
month(df$ymd)
#> [1] 1 1 1
day(df$ymd)
#> [1] 1 2 3