I'm using Scrapy and I read on the doc about the setting "CONCURRENT_REQUESTS". He speak about "The maximum number of concurrent (ie. simultaneous) requests that will be performed by the Scrapy downloader."
I created a spider in order to take questions and answers from Q&A websites, so i want to know if is possibile run multiple concurrent request. Now I have set this value to 1, because I don't want to loose some Item or override someone. The main doubt is that i have a Global ID idQuestion (for make a idQuestion.idAnswer) for any item do i don't know if making multiple requests all can be a mess and loose some Item o set wrong Ids.
This is a snippet of code:
class Scraper(scrapy.Spider):
uid = 1
def parse_page(self, response):
# Scraping a single question
item = ScrapeItem()
hxs = HtmlXPathSelector(response)
#item['date_time'] = response.meta['data']
item['type'] = "Question"
item['uid'] = str(self.uid)
item['url'] = response.url
#Do some scraping.
ans_uid = ans_uid + 1
item['uid'] = str(str(self.uid) + (":" + str(ans_uid)))
yield item
#Call recusivly the method on other page.
print("NEXT -> "+str(composed_string))
yield scrapy.Request(composed_string, callback=self.parse_page)
This is the skeleton of my code. I use uid for memorize the id for the single question and ans_uid for the answer. Ex:
1) Question
1.1) Ans 1 for Question 1
1.2) Ans 2 for Question 1
1.3) Ans 3 for Question 1
**Can I simply increase the CONCURRENT_REQUESTS value? without compromise anything? **