i am relatively new to R and have the problem outlined below. Any feedback and help is much appreciated. Many thanks!
I have a 400k lines CSV dataset with contributions from multiple countries, which appear to use different date formats.
In particular, I have four different formats:
(1) dd.mm.yyyy,
(2) excel date numbers,
(3) yyyymmdd
(4) yyyy-mm-dd
I need to convert all different date formats into one so that I can use the date variable to apply additional filters. My preferred target format is (3) yyyymmdd
Converting (4) into (3) is easy but I don't find a way to convert (1) and (2) into (3) as well.
Since four different date formats are in the same file, I think I need to find a way to select all observations with a specific format and then convert only those into format (3) at a time.
For info see the attached screenshot. the relevant dataframe is called dset and the variable is MS_RegistrationDate

I tried it with the as.Date function as depicted below but this results in all observation receiving the value NA#.
dset$MS_RegistrationDate <- as.Date(dset$MS_RegistrationDate, format="%m.%d.%Y")
dset$MS_RegistrationDate <- as.Date(dset$MS_RegistrationDate, format="%m/%d/%Y")
Below is the dput output to replicate a selection of the dataframe
--
structure(list(Model = c("P 280 B6x2/4NA", "P 280 B6x24NB",
"P 320 B6x24NA", "P 320 B6x24NA", "P 450 B6x24NB", "P 280 B6x2NA",
"P 280 B6x2NA", "P 340 B6x24NB", "P 280 B6x2NA", "P 280 B6x2NA",
"P 450 B6x2NA", "P 450 B6x2NA", "P 320 B6x2NA", "P 320 B6x2NA",
"P 320 B6x2NA", "P 320 B6x2NA", "P 320 B6x2NA", "P 320 B6x2NA",
"P 320 B6x2NA", "P 500 B6x24NB", "P 500 B6x2*4NB"), Make = c("Scania",
"Scania", "Scania", "Scania", "Scania", "Scania", "Scania", "Scania",
"Scania", "Scania", "Scania", "Scania", "Scania", "Scania", "Scania",
"Scania", "Scania", "Scania", "Scania", "Scania", "Scania"),
Manufacturer = c("Scania CV AB", "Scania CV AB", "Scania CV AB",
"Scania CV AB", "Scania CV AB", "Scania CV AB", "Scania CV AB",
"Scania CV AB", "Scania CV AB", "Scania CV AB", "Scania CV AB",
"Scania CV AB", "Scania CV AB", "Scania CV AB", "Scania CV AB",
"Scania CV AB", "Scania CV AB", "Scania CV AB", "Scania CV AB",
"Scania CV AB", "Scania CV AB"), VehicleGroup = c(9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9), VehicleSubgroup = c("9-RD",
"9-RD", "9-RD", "9-RD", "9-LH", "9-RD", "9-RD", "9-RD", "9-RD",
"9-RD", "9-LH", "9-LH", "9-RD", "9-RD", "9-RD", "9-RD", "9-RD",
"9-RD", "9-RD", "9-LH", "9-LH"), MS_RegistrationDate = c("20191129",
"2019-09-26T00:00:00.000", "2019-09-17", "2019-10-08", "30.10.2019",
"2019-09-25", "2019-10-17", "2019-11-21", "2019-10-08", "2019-10-17",
NA, NA, "2019-10-07", "2019-10-07", "2019-10-04", "2019-10-07",
"2019-10-07", "2019-10-07", "2019-10-07", "43766", "43787"
), RegDate_Corrected = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Match = c("Match",
"Match", "Match", "Match", "Match", "Match", "Match", "Match",
"Match", "Match", "OEM_only", "OEM_only", "Match", "Match",
"Match", "Match", "Match", "Match", "Match", "Match", "Match"
), Baseline = c("No", "No", "No", "No", "No", "No", "No",
"No", "No", "No", "No", "No", "No", "No", "No", "No", "No",
"No", "No", "No", "No"), VocationalVehicle = c("No", "No",
"No", "No", "No", "No", "No", "No", "No", "No", "No", "No",
"No", "No", "No", "No", "No", "No", "No", "No", "No"), ZeroEmissionVehicle = c("No",
"No", "No", "No", "No", "No", "No", "No", "No", "No", "No",
"No", "No", "No", "No", "No", "No", "No", "No", "No", "No"
), HybridElectricHDV = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), DualFuelVehicle = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), ExemptedVehicle = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), CO2v = c(710.88,
629.9, 683.52, 683.52, 841.48, 682.21, 682.21, 778.85, 682.21,
682.21, 844.79, 844.79, 613.28, 613.28, 679.51, 613.28, 613.28,
613.28, 613.28, 831.71, 831.71), Engine_Displacement_ltr = c(9.3,
9.3, 9.3, 9.3, 12.7, 9.3, 9.3, 9.3, 9.3, 9.3, 12.7, 12.7,
9.3, 9.3, 9.3, 9.3, 9.3, 9.3, 9.3, 12.7, 12.7), Engine_FuelType = c("Diesel CI",
"Diesel CI", "Diesel CI", "Diesel CI", "Diesel CI", "Diesel CI",
"Diesel CI", "NG PI", "Diesel CI", "Diesel CI", "Diesel CI",
"Diesel CI", "Diesel CI", "Diesel CI", "Diesel CI", "Diesel CI",
"Diesel CI", "Diesel CI", "Diesel CI", "Diesel CI", "Diesel CI"
), AirDrag_CdxA_range = c("A11", "A9", "A14", "A14", "A11",
"A14", "A14", "A13", "A14", "A14", "A11", "A11", "A9", "A9",
"A13", "A9", "A9", "A9", "A9", "A9", "A9"), WHTC_FuelConsumption_gkwh = c(216,
216, 216, 216, 201, 216, 216, 290, 216, 216, 201, 201, 216,
216, 216, 216, 216, 216, 216, 201, 201), Engine_RatedPower_kw = c(206,
206, 235, 235, 331, 206, 206, 251, 206, 206, 331, 331, 235,
235, 235, 235, 235, 235, 235, 368, 368)), row.names = c(NA,
-21L), class = c("tbl_df", "tbl", "data.frame"))
Following the date formates in your example, here is what I came up with:
Although I'm not sure how robust this will be. Are the four date formats that you mention always followed?? There are numerous ways to "creatively" enter dates, that might trip you up...