Scraping a HTML table with Rvest not working

51 views Asked by At

I am trying to retrieve the table under the 'Results Table' tab at this webpage enter image description here

I'm encountering issues when trying to inspect the elements of this table.

library(tidyverse)
library(rvest)

html_code <- read_html("https://bccsu-drugsense.onrender.com/")

html_code

## {html_document}
## <html>
## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n<!-- Global site tag (gtag.js) - Google Analytics --><script async src="https://www ...
## [2] <body>\n        \n<div id="react-entry-point">\n    <div class="_dash-loading">\n        Loading...\n    </div>\n</div>\n\n        <footer><script id="_dash-con ...

Usually this would return the HTML code, but for some reason it is not matching what I'm seeing on via devtools on the webpage itself. The only element returned in the <body> section is <div class="_dash-loading">. Therefore, when I try to inspect elements of the webpage it returns null.

html_code %>% html_elements("div") 
## {xml_nodeset (2)}
## [1] <div id="react-entry-point">\n    <div class="_dash-loading">\n        Loading...\n    </div>\n</div>
## [2] <div class="_dash-loading">\n        Loading...\n    </div>

html_code %>% html_elements("table") 
## {xml_nodeset (0)}

Not sure what Loading... means in this context, so would appreciate any advice and/or workaround. Thank you.

1

There are 1 answers

2
Till On BEST ANSWER

The contents of this page a rendered via javascript. rvest::read_html() can not capture this as it does not execute the javascript. The current development build of rvest has a new function for this: read_html_live().

It uses the chromote package, which relies on Google Chrome being installed on your system.

To install the dev version of rvest:

remotes::install_github("tidyverse/rvest")

With little modifications to your code, you can get to the table. Instead of read_html() we use read_html_live(). We then use html_code$click() to emulate a click on the tab for the table. Finally, rvest::html_table() extracts tables from a HTML document and turns them into a data.frame.

library(tidyverse)
library(rvest)


html_code <- read_html_live("https://bccsu-drugsense.onrender.com/")
html_code$click("li.nav-item:nth-child(3) > a:nth-child(1)")
html_table(html_code)

(Note: This works for me on Linux. On Windows read_html_live() is not working for me yet.)