I'm currently trying to scrape Biodiversity data from a specific website (http://www.faunaeur.org/?no_redirect=1). I have managed to get some results but not as automated as I hoped...The First Part is done, which is browsing through the website:
Setting up Rselenium:
library(RSelenium)
download.file("https://github.com/mozilla/geckodriver/releases/download/v0.11.1/geckodriver-v0.11.1-win64.zip",destfile="./gecko.zip")
unzip("./gecko.zip",exdir=".",overwrite=T)
checkForServer(update=T)
selfserv = startServer()
mybrowser1 = remoteDriver(browserName="firefox",extraCapabilities = list(marionette = TRUE))
mybrowser1$open()
Then starting my browsing (that would be an example for the Balearic Islands):
mybrowser1$navigate("http://www.faunaeur.org/distribution.php?current_form=species_list")
mybrowser1$findElement(using="xpath","//select[@name='taxon_rank']/option[@value='7']")$clickElement() # Class
mybrowser1$findElement(using="xpath","//input[@name='taxon_name']")$sendKeysToElement(list('Oligochaeta')) # Oligochète
mybrowser1$findElement(using="xpath","//select[@name='region']/option[@value='15']")$clickElement()
mybrowser1$findElement(using="xpath","//input[@name='include_doubtful_presence']")$clickElement()
mybrowser1$findElement(using="xpath","//input[@name='submit2']")$clickElement()
From this point I can download the xls file of the 20 subspecies by using:
mybrowser1$findElement(using = "xpath", "//a[@href='JavaScript:document.export_species_list.submit()']")$clickElement()
But that's not what I want, I don't want to use a "click". Is it possible to download the file from this JavaScript Link directly in my R environment or to scrape the table of the 20 subspecies directly from the source code of the webpage using Rselenium ?
I tried those two solutions but it's an impasse...The biggest problem is that the page is a temporary page or 'result page' and it seems that I can't find in it any @value, @id, @name or @class corresponding to the table I need.
Any clue on a solution which implied an automated way of doing it via R ? I need it in this form because the script have to be run afterwards by people who needs to create themself the results. Thanks in advance !
If you just want the table that is displayed on the website this can be done without Rselenium via
httr
as follows:Which gives you: