I'm new to geospatial stats and can't figure out a simple question:
I have two datasets with spatial coordinates. One has coordinates of hospitals and clinics in a particular district. The other has coordinates of all households in that district.
Here's some mock data
hospital_coord <-data.frame(longitude = c(80.15998, 72.89125, 77.65032, 77.60599),
latitude = c(12.90524, 19.08120, 12.97238, 12.90927))
people_coord <-data.frame(longitude = c(72.89537, 77.65094, 73.95325, 72.96746,
77.65058, 77.66715, 77.64214, 77.58415,
77.76180, 76.65470, 76.65480, 76.65490, 76.65500, 76.65560, 76.65560),
latitude = c(19.07726, 13.03902, 18.50330, 19.16764,
12.90871, 13.01693, 13.00954, 12.92079,
13.02212, 12.81447, 12.81457, 12.81467, 12.81477, 12.81487, 12.81497))
I would like to calculate the following:
- What percentage of households live more than 2 kilometres from the nearest clinic/hospital
- Create a column in the dataframe indicating which households are within or outside the 2km distance
I think this does what you want, using the more recent
sf
package rather thangeosphere
from the question linked. The approach is as follows:st_as_sf
st_distance
to compute the distance between each person and each hospital as aunits
table, in metres.units
table into a regulartbl
because it is a pain to deal with, and check which pairs have more than 2km separationmutate_at
to check each row to see whether each hospital is less than 2km away (T
) or more than 2km away (F
)pmap
andany
to check each row and see if at least one hospital is within 2km!It looks like only the first patient is within 2km of a hospital.