I am trying to do scraping from Google News with pygooglenews.
I am trying to scrape more than 100 articles at a time (as google sets limit at 100) by changing the target dates using for loop. The below is what I have so far but I keep getting error message
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-84-4ada7169ebe7> in <module>
----> 1 df = pd.DataFrame(get_news('Banana'))
2 writer = pd.ExcelWriter('My Result.xlsx', engine='xlsxwriter')
3 df.to_excel(writer, sheet_name='Results', index=False)
4 writer.save()
<ipython-input-79-c5266f97934d> in get_titles(search)
9
10 for date in date_list[:-1]:
---> 11 search = gn.search(search, from_=date, to_=date_list[date_list.index(date)])
12 newsitem = search['entries']
13
~\AppData\Roaming\Python\Python37\site-packages\pygooglenews\__init__.py in search(self, query, helper, when, from_, to_, proxies, scraping_bee)
140 if from_ and not when:
141 from_ = self.__from_to_helper(validate=from_)
--> 142 query += ' after:' + from_
143
144 if to_ and not when:
TypeError: unsupported operand type(s) for +=: 'dict' and 'str'
import pandas as pd
from pygooglenews import GoogleNews
import datetime
gn = GoogleNews()
def get_news(search):
stories = []
start_date = datetime.date(2021,3,1)
end_date = datetime.date(2021,3,5)
delta = datetime.timedelta(days=1)
date_list = pd.date_range(start_date, end_date).tolist()
for date in date_list[:-1]:
search = gn.search(search, from_=date.strftime('%Y-%m-%d'), to_=(date+delta).strftime('%Y-%m-%d'))
newsitem = search['entries']
for item in newsitem:
story = {
'title':item.title,
'link':item.link,
'published':item.published
}
stories.append(story)
return stories
df = pd.DataFrame(get_news('Banana'))
Thank you in advance.
It looks like you are correctly passing in a string into
get_news()which is then passed on as the first argument (search) intogn.search().However, you're reassigning
searchto the result ofgn.search()in the line:In the next iteration this reassigned
searchis passed intogn.search()which it doesn't like.If you look at the code in
pygooglenews, it looks likegn.search()is returning adictwhich would explain the error.To fix this, simply use a different variable, e.g.: