I'm trying to scrape the location of product reviewers from amazon. For example, this webpage
I need to get HAINESVILLE, ILLINOIS, United States
I use rvest package for webscraping.
Here is what I did:
library(rvest)
url='https://www.amazon.com/gp/profile/amzn1.account.AH55KF4JK5IKKJ77MPOLHOR4YAQQ/ref=cm_cr_dp_d_gw_tr?ie=UTF8'
page = read_html(url)
I got error like below:
Error in open.connection(x, "rb") : HTTP error 403.
But, the following works:
con <- url(url, "rb")
page = read_html(con)
However, with the page I read, I could not extract any text. For example, I want to extract the location of the reviewer.
page %>%
html_nodes("#customer-profile-name-header .a-size-base a-color-base")%>%
html_text()
I got nothing
character(0)
Can anyone help figure what I did wrong? Thanks a lot in advance.
This should work: