I've got a dataframe that looks like this:
long lat site
-141.37 61.13 x1
-149.1833 66.7333 x2
-149.667 67.667 x3
-141.3667 61.1157 x4
I want to calculate the distances between all of the site
's using distVincentyEllipsoid
. Then for those sites that are located within 5km distance from each other, I want to modify the site
name to include both sites. So, in this example x1
and x4
are within 5km from each other, so it will be like this:
long lat site
-141.37 61.13 x1_x4
-149.1833 66.7333 x2
-149.667 67.667 x3
-141.3667 61.1157 x1_x4
I know I can calculate a matrix between all site
's in this way:
df %>% dplyr::select('long', 'lat')
distm(df, fun = distVincentyEllipsoid)
But I don't know how to take it from there.
It is helpful if you provide the example data as R code, like this
but thank you for showing the expected output
Solution:
As you suggested, first make a distance matrix. Then classify that as within the threshold distance or not, and then use the rows to select the records. Note that I use
distGeo
--- it is a better method thandistVincentyEllipsoid
.If you have many points the distance matrix may become too large. In that case you could do
or like this