i want to scrape the entries in this table. it is apparently populated by javascript after the page loads, so rather than scraping (with something like webdriver), i'd like to directly request the data from whatever service the javascript is talking to.
using chrome dev tools' network tab, i thought i'd narrowed it down to an xhr POST to https://www.oregon.gov/oha/ERD/_vti_bin/client.svc/ProcessQuery, but the response shown doesn't look related, and none of the other network activity items seem to be either.
how do i track down exactly what request is populating the table?
HTML5 introduced web-storage, which, like cookies, caches data locally. this can prevent data requests after first loading a site. in chrome dev tools, go to the
applicationtab, and understorage, look for a key that has the data you want. if it's there, you can clear the storage, refresh, and then you'll see either anxhrorfetch[1] request in thenetworktab that got the data. you can right-click the request and copy it as acurlcommand to request the data directly with no scraping. you might worry that the service will prevent access from outside its approved web front end, butcorscan't stop you because it only applies to browsers.[1]
fetchis an improvedxhravailable since 2015thank you to @sideshowbarker for pointing me to
sessionStorageand answering mycorsquestions.