I am learning scrapy and am having a hard time trying to figure out this issue. My spider will not crawl the macys website and keeps throwing the following error:
[<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
Things I've tried so far:
- Setting headers and robotstxt obey per this thread: Scrapy Shell: twisted.internet.error.ConnectionLost although USER_AGENT is set
- Changing the user agent per this thread: How to prevent a twisted.internet.error.ConnectionLost error when using Scrapy?
- Cryptography <2 per this thread: Scrapy twisted connection lost in non-clean fashion. No proxy. Already tried headers
- Monkeypatch: Twisted Python Failure - Scrapy Issues
I also checked scrapy shell "www.macys.com" into the command prompt and get the same error. So I'm guessing the issue is not with my spider. Could someone please help?
It seems that your IP from you are launching your scraper has been blacklisted.
You might want to read the following: https://doc.scrapy.org/en/latest/topics/practices.html#avoiding-getting-banned
Also, you might want to tune the settings concerning the number of requests outputted by scrapy:
CONCURRENT_REQUESTS,DOWNLOAD_DELAY, etc.