I would like to download the information of a site with the rvest package. The information is the one contained under the HTML layer div_class="col-sm-8". How can I do this?
The usual way I followed doesn't work:
url <- "myurl"
pagina <- read_html(url)
titoli <- pagina %>%
html_nodes("col-sm-8") %>%
html_text()
The contents of this page are rendered with javascript and
read_html()does not execute any javascript. You’ll either have to use a scraping technique that renders the whole page in a headless browser (i.e.RSelenium) or you can write request against their API (with i.e.httr).The development version of
rvestprovidesread_html_live()which works for this on my end.Note: In your code you’d need to prefix the css selector with a
.to tell the parser to look for elements of that class.