When I do
import newspaper
paper = newspaper.build('http://cnn.com', memoize_articles=False)
print(len(paper.articles))
I see that newspaper found 902 articles from http://cnn.com, which seems quite little too me, considering that they publish many articles per day and has published articles online for many years. Are these really all articles there is on http://cnn.com? If not, is there any way I can find the URLs of the rest of the articles too?
Newspaper is only querying the items on the main page of CNN, so the module does not query all the categories (e.g. business, health, etc.) on the domain. Based on my code, there are only 698 unique articles as of today being discovered by Newspaper. Some of these articles might be the same, because some of the URLs have hashes, but look to be the same article.
P.S. You can query all the categories, but that requires Selenium coupled with Newspaper.