Aloha,
I am planning to run a case-control study for study sites that are evenly distributed spatially around the country. I need to select each case in the dataset and then match it to x number of controls (we will use a sensitivity analysis to select the optimal matches, so I need to be able to run it for 1,2,3,4,5,6,7,8, etc number of controls). As there is a spatial element to the data I want to run this computation within a distance matrix by selecting the controls within 25000 meters of the case.
I cannot find the optimal algorithm to run this computation in R. Is anyone aware of an optimal R package that would help me achieve this?
Thank you
To solve this I did the following
Got the coordinates of the site centroid (x,y)
Split the DB into my case-control groups
ran a spatial buffer of the cases
ran an intersection of the controls
assigned a label to all intersections (match_no)
Randomly sampled from within the match_no column
Code below.