I am trying to scrape data from the following websites with 4 dropdown menus - after clicking each dropdown menus they show a table from where I want to scrape data. I want to combine information from all tables from all dropdown menus.
I am using RSelenium
package however as I am very new to web scraping, I could not understand how to make loop with the four available options to get the final table.
https://hindi.iocl.com/lpgdistributors.aspx
I tried the previous discussion on webscraping and modify the code accordingly.
library(RSelenium)
library(rvest)
system("taskkill /im java.exe /f", intern=FALSE, ignore.stdout=FALSE)
rD <- rsDriver(browser = c("firefox")) #specify browser type you want Selenium to open
remDr <- rD$client
remDr$navigate("https://hindi.iocl.com/lpgdistributors.aspx") # navigates to webpage
# select first dropdown list
option <- remDr$findElement(using='id', value="cmbState")
#get all option values from dropdown list
option_values <- option$getPageSource()[[1]] %>%
str_extract_all("1[0-9]{3}")
# select 2nd dropdown list
option <- remDr$findElement(using='id', value="cmbDistrict")
#get all option values from dropdown list
option_values <- option$getPageSource()[[1]] %>%
str_extract_all("1[0-9]{3}")
# select 3rd dropdown list
option <- remDr$findElement(using='id', value="cmbMarket")
#get all option values from dropdown list
option_values <- option$getPageSource()[[1]] %>%
str_extract_all("1[0-9]{3}")
#select 4th dropdown list
option2 <- remDr$findElement(using='id', value="cmbArea")
#get all option values from dropdown list
option_values_2 <- option2$getElementText() %>%
str_split("\\n") %>%
unlist()
#### create loop to loop over all tables...
option <- remDr$findElement(using='id', value="cmbState")
option <- remDr$findElement(using = 'xpath', "//*/option[@value = '1']") #change '1194' to values in option_values in loop
option$clickElement()
# change dropdown selection
option2 <- remDr$findElement(using='id', value="cmbDistrict")
option2 <- remDr$findElement(using = 'xpath', "//*/option[@value = '185']") #change 'AHB' to values in option_values_2 in loop
option2$clickElement()
# change dropdown selection
option3 <- remDr$findElement(using='id', value="cmbMarket")
option3 <- remDr$findElement(using = 'xpath', "//*/option[@value = '2314']") #change 'AHB' to values in option_values_2 in loop
option3$clickElement()
# change dropdown selection
option4 <- remDr$findElement(using='id', value="cmbArea")
option4 <- remDr$findElement(using = 'xpath', "//*/option[@value = '57']") #change 'AHB' to values in option_values_2 in loop
option4$clickElement()
# click submit
submit <- remDr$findElement(using='id', value="btnSearch")
submit$clickElement()
#get table
tb <- remDr$findElement(using='id', value="grdDistributors")
tb$getPageSource()[[1]] %>%
read_html() %>%
html_table(fill = TRUE)
Here is a partial solution using
RSelenium
,Get List of All the states
Now you have to loop through the states to get list of all the districts, one example,