Newspaper3k: how to retrieve cashed articles?

Question

Newspaper3k: how to retrieve cashed articles?

354 views Asked by Ahmad At 31 August 2020 at 13:53

This document says that that by default, newspaper caches all previously extracted articles and eliminates any article which it has already extracted.

>>> cbs_paper = newspaper.build('http://cbs.com')
>>> cbs_paper.size()
1030

>>> cbs_paper = newspaper.build('http://cbs.com')
>>> cbs_paper.size()
2

Okay, but it says nothing if I build once a website how I can retrieve the cashed articles?

Original Q&A

There are 1 answers

**Akash Sahu** · Answer 1 · 2020-09-10T07:08:20+00:00

newspaper3k uses memoize to cache the articles for a source

setting memoize to false would stop the caching mechanism

cbs_paper = newspaper.build('http://cbs.com', memoize_articles=False)

however, if you still want the caching and want to access the cached articles you can find .newspaper_scraper dir inside the temp folder (Path in windows machine)

C:\Users\your_user\AppData\Local\Temp\.newspaper_scraper\memoized

For Linux-based OSes, try looking in

/tmp/.newspaper_scraper/memoized/

For macOS, look in the directory specified by $TMPDIR. This may be any or none of the following:

/tmp/.newspaper_scraper/memoized/
/private/tmp/.newspaper_scraper/memoized/
~/Library/Caches/TemporaryItems/.newspaper_scraper/memoized/

TechQA.

Newspaper3k: how to retrieve cashed articles?

There are 1 answers

Related Questions in PYTHON-NEWSPAPER

Related Questions in NEWSPAPER3K

Popular Questions

Popular Tags

Trending Questions