Distance between a set of points and a polygon with sf in R

11.3k views Asked by At

I have a dataframe of points on map and an area of interest described as a polygon of points. I want to calculate the distance between each of the points to the polygon, ideally using the sf package.

library("tidyverse")
library("sf")

# area of interest
area <- 
  "POLYGON ((121863.900623145 486546.136633659, 121830.369032584 486624.24942906, 121742.202408334 486680.476675484, 121626.493982203 486692.384434804, 121415.359596921 486693.816446951, 121116.219703244 486773.748535465, 120965.69439283 486674.642759986, 121168.798757601 486495.217550029, 121542.879304342 486414.780364836, 121870.487595417 486512.71203006, 121863.900623145 486546.136633659))"

# convert to sf and project on a projected coord system
area <- st_as_sfc(area, crs = 7415L)

# points with long/lat coords
pnts <- 
  data.frame(
    id = 1:3, 
    long = c(4.85558, 4.89904, 4.91073),
    lat = c(52.39707, 52.36612, 52.36255)
    )

# convert to sf with the same crs
pnts_sf <- st_as_sf(pnts, crs = 7415L, coords = c("long", "lat"))

# check if crs are equal
all.equal(st_crs(pnts_sf),st_crs(area))

I am wondering why the following approaches do not give me the correct answer.

1.Simply using the st_distance fun-doesn't work, wrong answer

st_distance(pnts_sf, area)

2.In a mutate call - all wrong answers

pnts_sf %>% 
  mutate(
    distance = st_distance(area, by_element = TRUE),
    distance2 = st_distance(area, by_element = FALSE),
    distance3 = st_distance(geometry, area, by_element = TRUE)
  )

However this approach seems to work and gives correct distances.

3.map over the long/lat - works correctly

pnts_geoms <- 
  map2(
    pnts$long, 
    pnts$lat, 
    ~ st_sfc(st_point(c(.x, .y)) , crs = 4326L) 
  ) %>% 
  map(st_transform, crs = 7415L)

map_dbl(pnts_geoms, st_distance, y = area)

I'm new to spatial data and I'm trying to learn the sf package so I'm wondering what is going wrong here. As far as i can tell, the first 2 approaches somehow end up considering the points "as a whole" (one of the points is inside the area polygon so i guess that's why one of the wrong answers is 0). The third approach is considering a point at a time which is my intention.
Any ideas how can i get the mutate call to work as well?

I'm on R 3.4.1 with

> packageVersion("dplyr")
[1] ‘0.7.3’
> packageVersion("sf")
[1] ‘0.5.5’
1

There are 1 answers

1
davidski On

So it turns out that the whole confusion was caused by a small silly oversight on my part. Here's the breakdown:

  • The points dataframe comes from a different source (!) than the area polygon.
  • Overseeing this I kept trying to set them to crs 7415 which is a legal but incorrect move and led eventually to the wrong answers.
  • The right approach is to convert them to sf objects in the crs they originate from, transform them to the one the area object is in and then proceed to compute the distances.

Putting it all together:

# this part was wrong, crs was supposed to be the one they were
# originally coded in
pnts_sf <- st_as_sf(pnts, crs = 4326L, coords = c("long", "lat"))

# then apply the transformation to another crs
pnts_sf <- st_transform(pnts_sf, crs = 7415L)

st_distance(pnts_sf, area)

--------------------------
Units: m
          [,1]
[1,] 3998.5701
[2,]    0.0000
[3,]  751.8097