I have two data set that I want to merge to find the census tract based on longitude and latitude
The first data set that I am using is the New York City Airbnb Open Data and its latitude and longitude column look like this.
latitude longitude
40.64749 -73.97237
40.75362 -73.98377
40.80902 -73.94190
40.68514 -73.95976
40.79851 -73.94399
The second data set that I am using contains the census block code for coordinates in NY.
Latitude Longitude BlockCode
40.48 -74.280000 340230076002012
40.48 -74.276834 340230076005000
40.48 -74.273668 340230076003018
40.48 -74.270503 340230076003004
40.48 -74.267337 340230074021000
I first attempted to calculate the single_pt_haversine (assume the distance's point to have coordinate (0,0)) for the latitude and longitude. Then, I inner join the two data set on the single_pt_haversine, there is not a match between the data sets. I then round up the single_pt_haversine to 3 decimal places and there were some matches in the columns, but only 300 or so rows got returned (out of the 48895 in the first dataset).
Is there a better to do this? Or maybe a package to determine the census tract from coordinate in Python?
I think the Python package
censusgeocode
should be sufficient for your case. So you can try:More documentation here: https://pypi.org/project/censusgeocode/