I have two very large datasets. Dataset 1 has sampling locations with lat, lon and date. Like this:
datetime longitude latitude
2022-08-10 03:26:08 147.8521 -20.2443
2020-10-02 16:12:52 152.3652 -23.1234
Dataset 2 has temperature data that I have extracted from a netCDF file. This dataframe is much bigger in terms of spatial extent and data records (as it includes data for many irrelevant locations and every day). Like this:
datetime longitude latitude temp
2022-08-10 12:00:00 147.8601 -20.2423 20.62
2022-08-10 04:30:08 147.8601 -20.2423 21.49
2022-08-11 09:10:23 152.3633 -23.1225 21.55
2020-10-02 16:12:52 152.4213 -23.1562 20.80
2020-10-02 16:12:52 153.4213 -24.1562 21.10
2020-11-01 12:00:00 152.4213 -23.1562 21.33
I would like to find the nearest neighbour from Dataset 2 based on nearest location (lat, lon) first and foremost, then nearest datetime, to populate a new column ('temp') in Dataset 1.
I would like the end result to look like this:
datetime longitude latitude temp
2022-08-10 03:26:08 147.8521 -20.2443 21.49
2020-10-02 16:12:52 152.3652 -23.1234 20.80
I am competent at matching values from different datasets, however, matching based on nearest neighbours in space and time is beyond my skill level.
After hours of fruitless searching I have come up with no answers. Can anyone help?
Here sp::spDists function is used to calculates spatial distances between two datasets (df1 and df2). It then iterates through each point in df1, identifying the nearest neighbor in df2 based on both spatial and temporal proximity. The criteria (max_acceptable_distance and max_acceptable_time_difference) ensure accurate matches.
but if i use this code, gives slightly different results
Another way u can get it by