Unable to scrape google news accurately

Question

Unable to scrape google news accurately

975 views Asked by user3353185 At 08 December 2014 at 14:35

I'm trying to scrape google headlines for a given keyword (eg. Blackrock) for a given period (eg. 7-jan-2012 to 14-jan-2012). I'm trying to do this by constructing the url and then using urllib2 as shown in the code below. if I put the constructed url in a browser, it gives me the correct result. however, if I use it through python, I get news results for the right keyword but for the current period. here'e the code. Can someone tell me what I'm doing wrong and how I can correct it?

import urllib
import urllib2
import json
from bs4 import BeautifulSoup
import requests

url = 'https://www.google.com/search?q=Blackrock&hl=en&gl=uk&authuser=0&source=lnt&tbs=cdr%3A1%2Ccd_min%3A7%2F1%2F2012%2Ccd_max%3A14%2F1%2F2012&tbm=nws'


req = urllib2.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3 Gecko/2008092417 Firefox/3.0.3')
response = urllib2.urlopen(req)


html = response.read()
soup = BeautifulSoup(html)

text = soup.text

start = text.index('000 results')+11
end = text.index('NextThe selection')
text = text[start:end]
print text

Original Q&A

There are 1 answers

**scandinavian_** · Accepted Answer · 2014-12-08T23:31:49+00:00

scandinavian_ On 08 December 2014 at 23:31 BEST ANSWER

The problem is with your user-agent, it works for me with:

req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36')

You are using a user-agent for Firefox 3, which is about 6 years old.

TechQA.

Unable to scrape google news accurately

There are 1 answers

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in GOOGLE-NEWS

Popular Questions

Popular Tags

Trending Questions