I'm sure this question has been answered elsewhere, but I have not been able to come up with it by searching.
I have points representing cities within a country along with population for each city. I also have a polygon file of counties. I want to find the location of the largest city within each county.
How can this be done?
Here is some data
structure(list(Country = c("us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us",
"us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us"), City =
c("cabarrus", "cox store", "cal-vel", "briarwood townhouses", "barker heights", "davie
crossroads", "crab point village", "azalea", "chesterfield", "charlesmont", "connor", "clover garden", "corriher heights", "callisons", "crestview acres", "clegg", "canaan park", "chantilly", "belgrade", "brices crossroads", "bluff", "butner", "bottom", "bandy", "bostian heights"), AccentCity = c("Cabarrus", "Cox Store", "Cal-Vel", "Briarwood Townhouses", "Barker Heights", "Davie Crossroads", "Crab Point Village", "Azalea", "Chesterfield", "Charlesmont", "Connor", "Clover Garden", "Corriher Heights", "Callisons", "Crestview Acres", "Clegg", "Canaan Park", "Chantilly", "Belgrade", "Brices Crossroads", "Bluff", "Butner", "Bottom", "Bandy", "Bostian Heights"), Region = c("NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC"), Population = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, A_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_), Latitude = (35.2369444, 35.275, 36.4291667, 35.295, 35.3111111, 35.8319444, 34.7602778, 35.58, 35.81, 5.9341667,
35.7419444, 36.1883333, 35.5605556, 35.0841667, 35.0213889, 35.8655556, 36.2761111, 36.3016667, 34.88, 34.8186111, 35.8377778, 36.1319444, 36.4747222, 35.6419444, 35.7544444), Longitude = c(-80.5419444, -82.0352778, -78.9694444, -81.5238889, -82.4441667, -80.535, -76.7305556, -82.4713889, -81.6611111, -81.5127778, -78.1486111, -79.4630556, -80.635, -76.7255556, -80.5427778, -78.8497222,
-79.7852778, -76.1711111, -77.2352778, -78.1016667, -82.8580556, -78.7569444, -80.7741667, -81.09, -80.9294444)), .Names = c("Country", "City", "AccentCity", "Region", "Population", "Latitude", "Longitude"), row.names = c(544L, 889L, 551L, 434L, 190L, 975L, 894L, 147L,
717L, 700L, 831L, 773L, 862L, 559L, 915L, 753L, 584L, 695L, 262L, 437L, 372L, 537L, 406L, 178L, 02L), class = "data.frame")
and some code to read in north carolina
xx <- readShapePoly(system.file("shapes/sids.shp", package="maptools")[1],
IDvar="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))
plot(xx)
I want to find the city with the maximum population within each county. i'm sorry I don't have a reproducible example. If I did, I would have the answer!
The short answer is that you should use
gContains(...)
in packagergeos
.Here is the long answer.
In the code below, we grab a high resolution shapefile of North Carolina counties from the GADM database, and a geocoded dataset of North Carolina cities from from the US Geological Survey database. The latter already has county information but we ignore that. Then we map cities to their appropriate county using
gContains(...)
, add that information to the cities data frame, and identify the largest city in each county using the data.table package. Most of the work is in 4 lines of code near the end.The workhorse here is the line:
This compares every point in the SpatialPointsDataFrame
NC.Cities
to every Polygon in the SpatialPolygonsDataFrameNC.counties
and returns a logical matrix where tthe rows represent cities and the columns represent counties, and the[i,j]
element isTRUE
if cityi
is in countyj
,FALSE
otherwise. We process the matrix row-wise in the next statement:using each row in succession to index the attributes table for
NC.counties
to extract the county name.The data you provided in your question has some problems which are nevertheless instructive. First, the NC shapefile in the
maptools
package is relatively low resolution. In particular this means that some of the coastal islands are completely missing, so any city on one of those islands will not map to a county. You might have the same problem with your real data so watch out for it.Second, comparing the
COUNTY
column in the original USGS dataset with thecounty
column which we added, there are 3 (out of 865) counties that do not agree. It turns out that, in those cases, the USGS database was wrong (or out of date). You might have the same problem so watch out for that too.Third, an additional three cities did not map to any county. These were all coastal cities and probably reflect small inaccuracies in the North Carolina shapefile. You night have this problem as well.