While running scrapy spider, I am seeing that the log message has "DEBUG:" which has 1. DEBUG: Crawled (200) (GET http://www.example.com) (referer: None) 2. DEBUG: Scraped from (200 http://www.example.com)
I want to know that 1. what to those "Crawled" and "Scraped from" meant for? 2. From where those above both ULRs returned from(i.e. while scraping page which variable/argument has holding those URLs)
Let me try to explain based on the
Scrapy Sample Code
shown on the Scrapy Website. I saved this in a filescrapy_example.py
.Executing this with the command
scrapy runspider scrapy_example.py
it will produce the following output:Crawled
means: scrapy has downloaded that webpage.Scraped
means: scrapy has extracted some data from that webpage.The
URL
is given in the script asstart_urls
parameter.Your output must have been generated by running a spider. Search the file where that spider is defined and you should be able to spot the place where the url is defined.