I am trying to get a list of Companies and jobs in a table from indeed.com's job board.
I am using the rvest package using a URL Base of http://www.indeed.com/jobs?q=proprietary+trader&
install.packages("gtools")
install.packages('rvest")
library(rvest)
library(gtools)
mydata = read.csv("setup.csv", header=TRUE)
url_base <- "http://www.indeed.com/jobs?q=proprietary+trader&"
names <- mydata$Page
results<-data.frame()
for (name in names){
url <-paste0(url_base,name)
title.results <- url %>%
html() %>%
html_nodes(".jobtitle") %>%
html_text()
company.results <- url %>%
html() %>%
html_nodes(".company") %>%
html_text()
results <- smartbind(company.results, title.results)
results3<-data.frame(company=company.results, title=title.results)
}
new <- results(Company=company, Title=title)
and then looping a contatenation. For some reason it is not grabbing all of the jobs and mixing the companies and jobs.
It might be because you make two separate requests to the page. You should change the middle part of your code to:
When I do that, it seems to give me 10 jobs and companies which match. Can you give an example otherwise of a query URL that doesn't work?