When I run the Python code
import newspaper
print(len(newspaper.build('http://cnn.com', memoize_articles=False).articles))
exit()
in Python 3 I get the output 897 (i.e. newspaper3k found 897 pages considered articles on the domain http://cnn.com), but when I run
import newspaper
print(len(newspaper.build('http://www.cnn.com', memoize_articles=False).articles))
exit()
(i.e., with an additional www.
; nothing else has changed) I only get 895. These numbers are consistent when I switch forth and back between these two URLs. Is the www.
actually significant in a URL? If so, why does the article count become so similar with these two URLs when using the newspaper3k library? Otherwise, why isn't the article count exactly the same?
As you can see below, several url's represented in www'less resource in two variants:
www
www
result: