test1 <- structure(list(trip_count = 1:10, pickup_datetime = structure(c(1357019059,
1357019939, 1357022493, 1357023065, 1357024439, 1357025235, 1357026348,
1357026924, 1357027562, 1357028863), tzone = "UTC", class = c("POSIXct",
"POSIXt")), dropoff_datetime = structure(c(1357019158, 1357021384,
1357023008, 1357024189, 1357024694, 1357025815, 1357026604, 1357027240,
1357027830, 1357029381), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
test2 <- structure(list(DATE = structure(c(1357001460, 1357005060, 1357008660,
1357012260, 1357015860, 1357019460, 1357023060, 1357026660, 1357030260,
1357033860), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
HourlyDryBulbTemperature = c(39L, 38L, 39L, 39L, 39L, 39L,
39L, 38L, 39L, 39L), HourlyPrecipitation = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0)), row.names = c(NA, 10L), class = "data.frame")
Hey guys, so I have two data frames and I want to join them based on the date and time. It should be a fuzzy_join
so that the join works if DATE
from test2
is within pickup_datetime
and dropoff_datetime
from test1
.
I tried
test1 <- fuzzy_left_join(test1,test2,by = c("DATE" = "pickup_datetime", "DATE" = "dropoff_datetime"),match_fun = list(`>=`, `<=`))
but this returns: Error: All columns in a tibble must be vectors. Column "col" is NULL.
UPDATE: I found a solution
dropoff_data <- str_split_fixed(test1$dropoff_datetime, " ", 2)
colnames(dropoff_data) <- c("join_date","dropoff_time")
test1 <- cbind.data.frame(test1,dropoff_data)
test1$join_time <- hour(hms(as.character(test1$dropoff_time)))
rm(dropoff_data)
dropoff_data <- str_split_fixed(test2$DATE, " ", 2)
colnames(dropoff_data) <- c("join_date","time")
test2 <- cbind.data.frame(test2,dropoff_data)
test2$join_time <- hour(hms(as.character(test2$time)))
rm(dropoff_data)
test1 <- left_join(test1,test2,by = c("join_date","join_time"))
Thanks everyone!
May be something like this could help you :