This website http://a810-bisweb.nyc.gov/bisweb/bispi00.jsp is for searching nyc building application information. Under the "Application Searches" section, there is "BIS Job Number:", so the information I want to extract is from the new page after I enter a job number and then click "go".
For example, from the dataset https://data.cityofnewyork.us/Housing-Development/DOB-Job-Application-Filings/ic3t-wcy2, I pick job number 220286232, and then go to the first website, put the number in "BIS Job Number:" and click go. Now I get a new page . The information i want is "Applicant of Record Information" (including applicant contact information).
I'm stuck here. How can I extract these applicant information under each job number?
I am very new to web scraping. I learned how I can extract information from entire page by using rvest, but I'm not familiar with web scraping across different websites.
Thank you.
Update: I tried to use Socrata API, but I found the applicant contact information doesn't have their own API fields.If there is no API field for the information (but other information on that page has fields), does it mean I can't use API to solve this problem?
Thank you!
On that page, top right, click on the "API" tab. A new modal dialog box will pop up "Access this Dataset via SODA API", copy the link, in this case https://data.cityofnewyork.us/resource/rvhx-8trz.json . This is an URL which provides the data directly in the machine-readable JSON format. But only 1000 records at a time will be fetched.
So maybe add appropriate
$offset
parameters. See the Socrata documentation. The City of New York seems to use this software for their Open Data platform.Maybe call them this way in your R script :
(untested for higher offsets)
Use jsonlite for converting JSON into R data frames.