I am looking for a way to calculate the separation distance between points in a pairwise fashion and store the results for each individual point in an accompanying nested data frame.
For example, I have this data frame (from the maps package) that contains information about us cities including their physical locations. I have discarded the rest of the information and nested the coordinates in a nested data frame. I intend to use distHaversine()
from the geosphere
package to calculate these distances.
library(tidyverse)
df <- maps::us.cities %>%
slice(1:20) %>%
group_by(name) %>%
nest(long, lat, .key = coords)
name coords
<chr> <list>
1 Abilene TX <tibble [1 x 2]>
2 Akron OH <tibble [1 x 2]>
3 Alameda CA <tibble [1 x 2]>
4 Albany GA <tibble [1 x 2]>
5 Albany NY <tibble [1 x 2]>
...(With 15 more rows)
I have looked into using the map family of functions coupled with mutate, but I am having a difficult time. The desired results are in the form as follows:
name coords sep_dist
<chr> <list> <list>
1 Abilene TX <tibble [1 x 2]> <tibble [19 x 2]>
2 Akron OH <tibble [1 x 2]> <tibble [19 x 2]>
3 Alameda CA <tibble [1 x 2]> <tibble [19 x 2]>
4 Albany GA <tibble [1 x 2]> <tibble [19 x 2]>
5 Albany NY <tibble [1 x 2]> <tibble [19 x 2]>
...(With 15 more rows)
With the sep_dist tibbles looking something like this:
location distance
<chr> <dbl>
1 Akron OH 1003
2 Alameda CA 428
3 Albany GA 3218
4 Albany NY 3627
5 Albany OR 97
...(With 14 more rows) -distances completely made up
Where location is the point that is being compared to name (in this case Abilene).
We can expand a "grid" with all the combination of location name and coordinates, but remove the combination with the same location name. After that, use
map2_dbl
to apply thedistHaversine
function.To create the final output, we can
group_by
based on name andnest
all the other desired columns.And each data frame in the
data
now looks like this.