Have a Question on Mapping with R, specifically around the choropleth maps in R.
I have a dataset of ZIP codes assigned to an are and some associated data (dataset is here).
My final data format is: Area ID, ZIP, Probability Value, Customer Count, Area Probability and Area Customer Total. I am attempting to present this data by plotting area probability and Area Customer Total on a Map. I have tried to do this by using the census TIGER Shapefiles but I guess R cannot handle the complete country.
I am comfortable with the Statistical capabilities and now I am moving all my Mapping from third party GIS focused applications to doing all my Mapping in R. Does anyone have any pointers to how to achieve this from within R?
To be a little more detailed, here's the point where R stops working -
shapes <- readShapeSpatial("tl_2013_us_zcta510.shp")
(where the shp file is the census/TIGER) shape file.
Edit - Providing further details. I am trying to first read the TIGER shapefiles, hoping to combine this spatial dataset with my data and eventually plot. I am having an issue at the very beginning when attempting to read the shape file. Below is the code with the output
require(maptools)
shapes<-readShapeSpatial("tl_2013_us_zcta510.shp")
Error: cannot allocate vector of size 317 Kb
There are several examples and tutorials on making maps using R, but most are very general and, unfortunately, most map projects have nuances that create inscrutable problems. Yours is a case in point.
The biggest issue I came across was that the US Census Bureau zip code tabulation area shapefile for the whole US is huge: ~800MB. When loaded using
readOGR(...)
the R SpatialPolygonDataFrame object is about 913MB. Trying to process a file this size, (e.g., converting to a data frame usingfortify(...)
), at least on my system, resulted in errors like the one you identified above. So the solution is to subset the file based in the zip codes that are actually in your data.This map:
was made from your data using the following code.
Explanation:
The
rgdal
package facilitates the creation of R Spatial objects from ESRI shapefiles. In your case we are importing a polygon shapefile into a SpatialPolygonDataFrame object in R. The latter has two main parts: a polygon section, which contains the latitude and longitude points that will be joined to create the polygons on the map, and a data section which contains information about the polygons (so, one row for each polygon). If, e.g., we call the Spatial objectmap
, then the two sections can be referenced asmap@polygons
andmap@data
. The basic challenge in making choropleth maps is to associate data from yourSample.csv
file, with the relevant polygons (zip codes).So the basic workflow is as follows:
Step 4 is the one that causes all the problems. First we have to associate zip codes with each polygon. Then we have to associate
Probability_1
with each zip code. This is a three step process.Each polygon in the Spatial data file has a unique ID, but these ID's are not the zip codes. The polygon ID's are stored as row names in
map@data
. The zip codes are stored inmap@data
, in columnZCTA5CE10
. So first we must create a data frame that associates themap@data
row names (id
) withmap@data$ZCTA5CE10
(ZIP
). Then we merge yourSample.csv
with the result using the ZIP field in both data frames. Then we merge the result of that intomap.df
. This can be done in 3 lines of code.Drawing the map involves telling ggplot what dataset to use (map.df), which columns to use for x and y (long and lat) and how to group the data by polygon (group=group). The columns
long
,lat
, andgroup
inmap.df
are all created by the call tofortify(...)
. The call togeom_polygon(...)
tells ggplot to draw polygons and fill using the information inmap.df$Probability_1
. The call togeom_path(...)
tells ggplot to create a layer with state boundaries. The call toscale_fill_gradientn(...)
tells ggplot to use a color scheme based on the color brewer "Reds" palette. Finally, the call tocoord_equal(...)
tells ggplot to use the same scale for x and y so the map is not distorted.NB: The state boundary layer, uses the US States TIGER file.