Newspaper3k scrape several websites

Question

Newspaper3k scrape several websites

475 views Asked by AudioBubble At 07 October 2020 at 19:01

I want to get articles from several websites. I tried this but I don't know what I have to do next

lm_paper = newspaper.build('https://www.lemonde.fr/')
parisien_paper = newspaper.build('https://www.leparisien.fr/')

papers = [lm_paper, parisien_paper]
news_pool.set(papers, threads_per_source=2) # (3*2) = 6 threads total
news_pool.join()

Original Q&A

There are 1 answers

**Life is complex** · Accepted Answer · 2020-10-13T13:41:35+00:00

Below is the way you can use newspaper news_pool. I did note that the processing time for news_pool is time intensive, because it takes minutes to start printing titles. I believe that this time lag is related to the articles being downloaded in the background. I'm unsure how to speed this process up using Newspaper.

import newspaper
from newspaper import Config
from newspaper import news_pool

USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'

config = Config()
config.browser_user_agent = USER_AGENT
config.request_timeout = 10

lm_paper = newspaper.build('https://www.lemonde.fr/', config=config, memoize_articles=False)
parisien_paper = newspaper.build('https://www.leparisien.fr/', config=config, memoize_articles=False)
french_papers = [lm_paper, parisien_paper]

# this setting is adjustable 
news_pool.config.number_threads = 2

# this setting is adjustable 
news_pool.config.thread_timeout_seconds = 1

news_pool.set(french_papers)
news_pool.join()

for source in french_papers:
for article_extract in source.articles:
    if article_extract:
        article_extract.parse()
        print(article_extract.title)

TechQA.

Newspaper3k scrape several websites

There are 1 answers

Related Questions in PYTHON-NEWSPAPER

Related Questions in NEWSPAPER3K

Popular Questions

Popular Tags

Trending Questions