I'm trying to download a pdf from the National Information Center via RCurl
but I've been having some trouble. For this example URL, I want the pdf corresponding to the default settings, except for "Report Format" which should be "PDF". When I run the following script, it saves the file associated with selecting the other buttons ("Parent(s) of..."/HMDA -- not the default). I tried adding these input elements to params
, but it didn't change anything. Could somebody help me identify the problem? thanks.
library(RCurl)
curl = getCurlHandle()
curlSetOpt(cookiejar = 'cookies.txt', curl = curl)
params = list(rbRptFormatPDF = 'rbRptFormatPDF')
url = 'https://www.ffiec.gov/nicpubweb/nicweb/OrgHierarchySearchForm.aspx?parID_RSSD=2162966&parDT_END=99991231'
html = getURL(url, curl = curl)
viewstate = sub('.*id="__VIEWSTATE" value="([0-9a-zA-Z+/=]*).*', '\\1', html)
event = sub('.*id="__EVENTVALIDATION" value="([0-9a-zA-Z+/=]*).*', '\\1', html)
params[['__VIEWSTATE']] = viewstate
params[['__EVENTVALIDATION']] = event
params[['btnSubmit']] = 'Submit'
result = postForm(url, .params=params, curl=curl, style='POST')
writeBin( as.vector(result), 'test.pdf')
Does this provide the correct PDF?