Python script fails to parse newspaper article while tried in a virtual machine

184 views Asked by At

I've created a simple Python code for news summarization, which uses newspaper3k library on Python 3.10. I ran the script in my personal laptop and it works fine. I moved the libraries and script to a virtual machine in our organization and tried running it there (using Pycharm). However I get an error while using article.parse().

Here's the script;

import nltk

import newspaper

from textblob import TextBlob

from newspaper import Article

from newspaper import Config

url = "https://press.un.org/en/2023/sc15277.doc.htm"

config = Config()

config.request_timeout = 60

output = Article(url,config=config)

print(f'URL: {output.url}')

output.download()

output.parse()

output.nlp()

print(f'Summary: {output.summary}')

The error I get is;

URL: https://press.un.org/en/2023/sc15277.doc.htm

Traceback (most recent call last):

  File "C:\Users\----------\PycharmProjects\pythonProject\main.py", line 14, in <module>

    output.parse()

  File "C:\Users\-----------\PythonInterpreter\Lib\site-packages\newspaper\article.py", line 191, in parse

    self.throw_if_not_downloaded_verbose()

  File "C:\Users\-----------\PythonInterpreter\Lib\site-packages\newspaper\article.py", line 531, in throw_if_not_downloaded_verbose

    raise ArticleException('Article `download()` failed with %s on URL %s' %
newspaper.article.ArticleException: Article `download()` failed with HTTPSConnectionPool(host='press.un.org', port=443): Max retries exceeded with url: /en/2023/sc15277.doc.htm (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1002)'))) on URL https://press.un.org/en/2023/sc15277.doc.htm

Process finished with exit code 1

I tried adding the website certificate in Pycharm, tried changing the proxy settings. But the error persists. The URL is accessible in the virtual machine. I also tested the connectivity to the URL in Pycharm and the connection was succesfull.

0

There are 0 answers