I'm using Scrapy and I read on the doc about the setting "CONCURRENT_REQUESTS". He speak about "The maximum number of concurrent (ie. simultaneous) requests that will be performed by the Scrapy downloader."
I created a spider in order to take questions and answers from Q&A websites, so i want to know if is possibile run multiple concurrent request. Now I have set this value to 1, because I don't want to loose some Item or override someone. The main doubt is that i have a Global ID idQuestion (for make a idQuestion.idAnswer) for any item do i don't know if making multiple requests all can be a mess and loose some Item o set wrong Ids.
This is a snippet of code:
class Scraper(scrapy.Spider): uid = 1 def parse_page(self, response): # Scraping a single question item = ScrapeItem() hxs = HtmlXPathSelector(response) #item['date_time'] = response.meta['data'] item['type'] = "Question" item['uid'] = str(self.uid) item['url'] = response.url #Do some scraping. ans_uid = ans_uid + 1 item['uid'] = str(str(self.uid) + (":" + str(ans_uid))) yield item #Call recusivly the method on other page. print("NEXT -> "+str(composed_string)) yield scrapy.Request(composed_string, callback=self.parse_page)
This is the skeleton of my code. I use uid for memorize the id for the single question and ans_uid for the answer. Ex:
1.1) Ans 1 for Question 1
1.2) Ans 2 for Question 1
1.3) Ans 3 for Question 1
**Can I simply increase the CONCURRENT_REQUESTS value? without compromise anything? **