I am using Scrapy to scrape a page:

http://feeds.reuters.com/reuters/companyNews

I tried many times and I am convinced that the following doesn't work (in the shell) and returns empty result:

response.xpath('//*[@class="itemtitle"]/a/text()').extract()

this is where in chrome console, this brings me the expected result:

$x('//*[@class="itemtitle"]/a/text()')[0]

I checked the robot.txt for the target url and found out the following:

User-agent: *
Disallow: /~a/

I am wondering if it is not allowed to scrape it.

So my specific question is that is it possible to prevent robots from scraping on certain pages? if not what can be wrong with my code, that bring empty result in Scrapy shell.

0 Answers