i want to scrape the entries in this table. it is apparently populated by javascript after the page loads, so rather than scraping (with something like webdriver), i'd like to directly request the data from whatever service the javascript is talking to.
using chrome dev tools' network tab, i thought i'd narrowed it down to an xhr POST
to https://www.oregon.gov/oha/ERD/_vti_bin/client.svc/ProcessQuery
, but the response shown doesn't look related, and none of the other network activity items seem to be either.
how do i track down exactly what request is populating the table?
HTML5 introduced web-storage, which, like cookies, caches data locally. this can prevent data requests after first loading a site. in chrome dev tools, go to the
application
tab, and understorage
, look for a key that has the data you want. if it's there, you can clear the storage, refresh, and then you'll see either anxhr
orfetch
[1] request in thenetwork
tab that got the data. you can right-click the request and copy it as acurl
command to request the data directly with no scraping. you might worry that the service will prevent access from outside its approved web front end, butcors
can't stop you because it only applies to browsers.[1]
fetch
is an improvedxhr
available since 2015thank you to @sideshowbarker for pointing me to
sessionStorage
and answering mycors
questions.