Fuzzy Join with POSIXct and POSIXt

136 views Asked by At
test1 <- structure(list(trip_count = 1:10, pickup_datetime = structure(c(1357019059, 
1357019939, 1357022493, 1357023065, 1357024439, 1357025235, 1357026348, 
1357026924, 1357027562, 1357028863), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), dropoff_datetime = structure(c(1357019158, 1357021384, 
1357023008, 1357024189, 1357024694, 1357025815, 1357026604, 1357027240, 
1357027830, 1357029381), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl", 
"data.frame"))
test2 <- structure(list(DATE = structure(c(1357001460, 1357005060, 1357008660, 
1357012260, 1357015860, 1357019460, 1357023060, 1357026660, 1357030260, 
1357033860), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    HourlyDryBulbTemperature = c(39L, 38L, 39L, 39L, 39L, 39L, 
    39L, 38L, 39L, 39L), HourlyPrecipitation = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0)), row.names = c(NA, 10L), class = "data.frame")

Hey guys, so I have two data frames and I want to join them based on the date and time. It should be a fuzzy_join so that the join works if DATE from test2 is within pickup_datetime and dropoff_datetime from test1. I tried

test1 <- fuzzy_left_join(test1,test2,by = c("DATE" = "pickup_datetime", "DATE" = "dropoff_datetime"),match_fun = list(`>=`, `<=`))

but this returns: Error: All columns in a tibble must be vectors. Column "col" is NULL.

UPDATE: I found a solution

dropoff_data <- str_split_fixed(test1$dropoff_datetime, " ", 2)
colnames(dropoff_data) <- c("join_date","dropoff_time")
test1 <- cbind.data.frame(test1,dropoff_data)
test1$join_time <- hour(hms(as.character(test1$dropoff_time)))
rm(dropoff_data)

dropoff_data <- str_split_fixed(test2$DATE, " ", 2)
colnames(dropoff_data) <- c("join_date","time")
test2 <- cbind.data.frame(test2,dropoff_data)
test2$join_time <- hour(hms(as.character(test2$time)))
rm(dropoff_data)

test1 <- left_join(test1,test2,by = c("join_date","join_time"))

Thanks everyone!

1

There are 1 answers

3
MrSmithGoesToWashington On

May be something like this could help you :

library(data.table)
setDT(test1)
setDT(test2)
t <- test1[, c(.SD, as.list(test2)), by = 1:nrow(test1)]
t[DATE >= pickup_datetime & DATE <= dropoff_datetime]