I'm new to geospatial stats and can't figure out a simple question:
I have two datasets with spatial coordinates. One has coordinates of hospitals and clinics in a particular district. The other has coordinates of all households in that district.
Here's some mock data
hospital_coord <-data.frame(longitude = c(80.15998, 72.89125, 77.65032, 77.60599),
latitude = c(12.90524, 19.08120, 12.97238, 12.90927))
people_coord <-data.frame(longitude = c(72.89537, 77.65094, 73.95325, 72.96746,
77.65058, 77.66715, 77.64214, 77.58415,
77.76180, 76.65470, 76.65480, 76.65490, 76.65500, 76.65560, 76.65560),
latitude = c(19.07726, 13.03902, 18.50330, 19.16764,
12.90871, 13.01693, 13.00954, 12.92079,
13.02212, 12.81447, 12.81457, 12.81467, 12.81477, 12.81487, 12.81497))
I would like to calculate the following:
- What percentage of households live more than 2 kilometres from the nearest clinic/hospital
- Create a column in the dataframe indicating which households are within or outside the 2km distance
I think this does what you want, using the more recent
sfpackage rather thangeospherefrom the question linked. The approach is as follows:st_as_sfst_distanceto compute the distance between each person and each hospital as aunitstable, in metres.unitstable into a regulartblbecause it is a pain to deal with, and check which pairs have more than 2km separationmutate_atto check each row to see whether each hospital is less than 2km away (T) or more than 2km away (F)pmapandanyto check each row and see if at least one hospital is within 2km!It looks like only the first patient is within 2km of a hospital.