Web scraping news articles and keyword search

Question

Web scraping news articles and keyword search

1.8k views Asked by Fasiha At 02 December 2020 at 16:19

I have a code which fetches me titles of news articles in webpages. I have used a for loop in which I get the titles of 4 news websites. I have also implemented a word search which tells the number of articles in which the word " coronavirus" is used. I want the word search such that it tells me the number of articles with the word "coronavirus" in each website. Right now I'm getting the output of the number of times the word "coronavirus" is used in all the websites put together. Please help me, I have to submit this project shortly. Following is the code:

from bs4 import BeautifulSoup
from bs4.dammit import EncodingDetector
from newspaper import Article
import requests
URL=["https://www.timesnownews.com/coronavirus","https://www.indiatoday.in/coronavirus", "https://www.ndtv.com/coronavirus?pfrom=home-mainnavigation"]
for url in URL:
    parser = 'html.parser'  
    resp = requests.get(url)
    http_encoding = resp.encoding if 'charset' in resp.headers.get('content-type', '').lower() else None
    html_encoding = EncodingDetector.find_declared_encoding(resp.content, is_html=True)
    encoding = html_encoding or http_encoding
    soup = BeautifulSoup(resp.content, parser, from_encoding=encoding)
    
    links = []
    for link in soup.find_all('a', href=True):
        if "javascript" in link["href"]:
            continue
        links.append(link['href'])
            
    count = 0
     
            
    for link in links:
        try:
            article = Article(link)
            article.download()
            article.parse()
            print(article.title)
            if "COVID" in article.title or "coronavirus" in article.title or "Coronavirus"in article.title or "Covid-19" in article.title or "COVID-19" in article.title :
                    count += 1
    
        except:
            pass
         
        
print(" number of articles with the word COVID:")
print(count)

Original Q&A

There are 1 answers

**Arthur Pereira** · Accepted Answer · 2020-12-02T16:29:39+00:00

Actually you are getting only the last site count. If you want to get then all, append it to a list, then you can print the count for each site.

First create an empty list and append the final count each iteration:

URL = ["https://www.timesnownews.com/coronavirus", "https://www.indiatoday.in/coronavirus",
       "https://www.ndtv.com/coronavirus?pfrom=home-mainnavigation"]
Url_count = []

for url in URL:
    parser = 'html.parser'
    ...
    ...
        except:
            pass

    Url_count.append(count)

Then you can use zip to print the results:

for url, count in zip(URL, Url_count):
    print("Site:", url, "Count:", count)

TechQA.

Web scraping news articles and keyword search

There are 1 answers

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Related Questions in NEWSPAPER3K

Popular Questions

Popular Tags

Trending Questions