Using python 2.7 I am attempting to scrape and import articles from the NYT and have done so before with no problem either when getting one article or multiple at the same time and now getting error AttributeError: 'module' object has no attribute 'Scraper'.
I am using the newspaper package and it has worked great so far until this error. It appears to work on some html links and not on others despite the html links being accurate. Any ideas on a solution?
here is my code:
import pandas as pd
import newspaper
from newspaper import Article
url3='http://www.nytimes.com/2010/08/04/nyregion/04shooting.html'
url4='http://www.nytimes.com/2010/08/04/nyregion/04gunman.html'
url5='http://www.nytimes.com/2010/08/05/nyregion/05shooting.html'
url6='http://www.nytimes.com/2010/08/05/nyregion/05vics.html'
urls=[url3, url4,url5,url6]
Nyt_HBC =pd.DataFrame()
for i in urls:
a=Article(i, language='en')
a.download()
a.parse()
Nyt_HBC= Nyt_HBC.append([[a.title, a.text]], ignore_index=True)
Nyt_HBC.columns=['Title','Article']
Nyt_HBC
Here is my full error message(quick note you can not run it without .parse())-
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-47-12545a6e9854> in <module>()
9 a=Article(i, language='en')
10 a.download()
---> 11 a.parse()
12 Nyt_HBC= Nyt_HBC.append([[a.title, a.text]], ignore_index=True)
13 Nyt_HBC.columns=['Title','Article']
/Users/ThomasPLapinger/anaconda/lib/python2.7/site-packages/newspaper/article.pyc in parse(self)
226
227 if self.config.fetch_images:
--> 228 self.fetch_images()
229
230 self.is_parsed = True
/Users/ThomasPLapinger/anaconda/lib/python2.7/site-packages/newspaper/article.pyc in fetch_images(self)
245 first_img = self.extractor.get_first_img_url(
246 self.url, self.clean_top_node)
--> 247 self.set_top_img(first_img)
248
249 if not self.has_top_image():
/Users/ThomasPLapinger/anaconda/lib/python2.7/site-packages/newspaper/article.pyc in set_top_img(self, src_url)
399 def set_top_img(self, src_url):
400 if src_url is not None:
--> 401 s = images.Scraper(self)
402 if s.satisfies_requirements(src_url):
403 self.set_top_img_no_check(src_url)
AttributeError: 'module' object has no attribute 'Scraper'