scraping google news headlines

Question

scraping google news headlines

2.8k views Asked by user3353185 At 28 November 2014 at 23:54

Google news is searchable by keyword and then that search can be narrowed down to a certain time period.

I tried doing the search on the website and then using the url of the results page to reverse engineer the search in python thus:

import urllib2


url = 'https://www.google.com/search?hl=en&gl=uk&tbm=nws&authuser=0&q=apple&oq=apple&gs_l=news-cc.3..43j0l9j43i53.5710.6848.0.7058.5.4.0.1.1.0.66.230.4.4.0...0.0...1ac.1.SRcIeXL5d48'

handler = urllib2.urlopen(url)
html = handler.read()

however, i get a 403 error. This method works with other websites, such as bbc.co.uk. so obviously google does not want me to scrape the website with python.

so i have two questions: 1) is it possible to bypass this restriction google has placed? if so, how? 2) are there any other scrapeable news sites where i can search for news on a keyword for a given period.

for either of the options, i don't mind using a paid service. so such suggestions are welcome too.

thanks in advance, K.

Original Q&A

There are 1 answers

**maruthu chandrasekaran** · Accepted Answer · 2014-11-29T00:12:55+00:00

maruthu chandrasekaran On 29 November 2014 at 00:12 BEST ANSWER

Try setting User-Agent

req = urllib2.Request(path)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3 Gecko/2008092417 Firefox/3.0.3')
response = urllib2.urlopen(req)

TechQA.

scraping google news headlines

There are 1 answers

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in GOOGLE-NEWS

Popular Questions

Popular Tags

Trending Questions