I am using a scrapy crawl spider to count the number of instances of specific words on each page in a domain. So far, my code is generally successful in doing so, but I would like it to be case insensitive and to only count full words. For example, if I am counting the number of times 'demo' appears, I would like it to also count 'Demo' and 'DEMO' but not 'democracy'. Here is what I have so far:
def parse_item(self, response):
yield{
'demo': len(response.css('body').re('demo')),
}
For the case sensitivity issue, I have found advice that suggests using xpath's translate or re.ignorecase. For the full words only issue, I have found advice on using word boundaries. However, I am not sure how to incorporate any of them in this situation. I have tried and failed a number of times.
Edit
The following fix solves the problem:
def parse_item(self, response):
yield{
'demo': len(response.css('body').re(r'(?i)\bdemo\b')),
}