I want to scrape data from a website; however I keep getting the HTTP: Error 405: Not Allowed. What am I doing wrong?
(I have looked at the documentation, and tried their code, with only my url in place of the example's; I still have the same error.)
Here's the code:
import requests, urllib
from urllib.request import Request, urlopen
list_url= ["http://www.glassdoor.com/Reviews/WhiteWave-Reviews-E9768.htm"]
for url in list_url:
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
response=urllib.request.urlopen(req).read()
If I skip the user-agent term, I get HTTP Error 403: Forbidden.
In the past, I have successfully scraped data (from another website) using the following:
for url in list_url:
raw_html = urllib.request.urlopen(url).read()
soup=None
soup = BeautifulSoup(raw_html,"lxml")
Ideally, I would like to keep a similar structure, that is, pass the content of the fetched url to BeautifulSoup. Thanks!
Not sure about exactly reason of the issue, but try this code it is working for me: