I have a movement dataset and a map of Canada as a shapefile (.shp). An example of the movement dataset looks like this (the coordinates are projected in ESPG:5321):
longitude latitude DateTimeRounded
459663.3 7181890 2007-09-10
459734.2 7181938 2007-09-11
459680.5 7181933 2007-09-12
459640.1 7181893 2007-09-13
459605.2 7181897 2007-09-14
459928.7 7182175 2007-09-15
459855.1 7182104 2007-09-16
I'm not sure how to attach or show the shapefile, but if any one has suggestions, please let me know!
My goal is to create a new column in my dataset called "region", which indicates in which province my each data point is. If a datapoint is outside the borders of any province (i.e., in the sea), I want the associated entry in the region column to be NA.
The final dataset would look something like this:
longitude latitude DateTimeRounded region
459663.3 7181890 2007-09-10 Manitoba
459734.2 7181938 2007-09-11 Manitoba
459680.5 7181933 2007-09-12 Manitoba
459640.1 7181893 2007-09-13 NA
459605.2 7181897 2007-09-14 NA
459928.7 7182175 2007-09-15 NA
459855.1 7182104 2007-09-16 Ontario
Originally, I used the package rgdal to do this, but since it has been retired I'm not sure how to go about it anymore. Would any one know how I can do this using other spatial packages like sp or terra?
This is what the original code looked like for reference:
setwd("CanadaMap/") # Directory for Canada Map
CANmap = readOGR(dsn=".", layer= "canada 5321")#File path to Canada Map
coordinates(df) = ~longitude + latitude #change longitude/latitude to what the lat/long columns are in your data frame
proj4string(df) <- CRS("+init=epsg:5321") #Use the correct reference system for your data
df <- spTransform(df, CRS("+proj=lcc +lat_1=44.5 +lat_2=54.5 +lat_0=0 +lon_0=-84 +x_0=1000000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs")) #denotes what CRS to use for df (same as mapCRS above)
df$region = over(df,CANmap)$NAME #adds column for region (which was province)
df<-as.data.frame(df)
The
sfpackage may work for your case.Once we have our data prepared, then it's only a few more lines of code:
In this case, NAME corresponds to county in the North Carolina shapefile, but in your dataset would correspond to Region. Our spatial join specifies
left = TRUEto signify a left join to keep all of our lat/lon data and fill in NAs if there's no matching region that the coordinates fall within from our shapefile. Note that thesfpackage uses data.frames with a geometry list column to store spatial data. Depending on what else you need to do afterwards, this may be fine or you may need to transform thegeometrylist column back into separate lat/lon columns.