Web-scraping: GET request returns a string - how can I read/process it in R?

47 views Asked by At

I'm sending a GET request to an governmental page in India with the following URL: https://pmfby.gov.in/landingPage/districtState?stateID=2F3DE245-46E6-4C4D-9297-C0B23C803B15&sssyID=02012223

What this basically does on their webpage (https://pmfby.gov.in/ -> "Insurance Premium Calculator") is returning all districts for a specific state selected (with JavaScript).

I used the following R code to reproduce this:

content(httr::GET("https://pmfby.gov.in/landingPage/districtState?stateID=2F3DE245-46E6-4C4D-9297-C0B23C803B15&sssyID=02012223"))

I somehow expected to get a HTML or JSON structured return with a list of those districts. What I actually get back however is a string that starts with "09f2ba53495a95c8d6189182d996252553a088b548d4f3c7caf1e195ff....".

I don't even know what format this is and I'm unable to convert/process this is R. Is there any solution to my problem?

1

There are 1 answers

0
NoAlibi On

Thanks a lot for all the advice. It's the underlying data that is of interest to me which is filled with javascript and which I tried to grab it via GET request. However, it does not seem to be possible that easily.

Grzegorz's advise to use rvest::LiveHTML() brought me on the right track. However I'm using RSelenium together with xpath to fill the form and grab the data once completed. The key to success was basically this line:

remoteDriver$findElement(using = 'xpath', value = '//*[@class="col-lg-4 col-md-4 col-sm-4 col-xs-12"][1]/div/select')$sendKeysToElement(list("Rabi"))$clickElement()

Cheers, NoAlibi