newspaper3k - get articles from HTML instead of URL

Question

newspaper3k - get articles from HTML instead of URL

598 views Asked by Milano At 13 July 2021 at 10:34

I'm using newspaper3k inside Scrapy parse method. I want to extract links but I don't want to fetch the website again.

Is it possible to use this:

newspaper.build(..)

with plain html so I can call .articles than?

Original Q&A

There are 1 answers

**Dmitrii K** · Answer 1 · 2022-05-27T11:10:06+00:00

Dmitrii K On 27 May 2022 at 11:10

I found this solution:

import httpx

from newspaper import Article

async def get_article(url):
    with httpx.AsyncClient() as client:
        response = await client.get(url)

    article = Article(url)
    article.set_html(response.text)
    article.parse()

TechQA.

newspaper3k - get articles from HTML instead of URL

There are 1 answers

Related Questions in PYTHON

Related Questions in PARSING

Related Questions in WEB-SCRAPING

Related Questions in SCRAPY

Related Questions in NEWSPAPER3K

Popular Questions

Trending Questions