Linked Questions

Popular Questions

Info on Scrapy CONCURRENT_REQUESTS in Python

Asked by At

I'm using Scrapy and I read on the doc about the setting "CONCURRENT_REQUESTS". He speak about "The maximum number of concurrent (ie. simultaneous) requests that will be performed by the Scrapy downloader."

I created a spider in order to take questions and answers from Q&A websites, so i want to know if is possibile run multiple concurrent request. Now I have set this value to 1, because I don't want to loose some Item or override someone. The main doubt is that i have a Global ID idQuestion (for make a idQuestion.idAnswer) for any item do i don't know if making multiple requests all can be a mess and loose some Item o set wrong Ids.

This is a snippet of code:

class Scraper(scrapy.Spider):
    uid = 1


    def parse_page(self, response):
        # Scraping a single question

        item = ScrapeItem()
        hxs = HtmlXPathSelector(response)
        #item['date_time'] = response.meta['data']
        item['type'] = "Question"
        item['uid'] = str(self.uid)
        item['url'] = response.url

        #Do some scraping.
        ans_uid = ans_uid + 1
        item['uid'] = str(str(self.uid) + (":" + str(ans_uid)))
        yield item

        #Call recusivly the method on other page.
        print("NEXT -> "+str(composed_string))
        yield scrapy.Request(composed_string, callback=self.parse_page)

This is the skeleton of my code. I use uid for memorize the id for the single question and ans_uid for the answer. Ex:

1) Question

1.1) Ans 1 for Question 1

1.2) Ans 2 for Question 1

1.3) Ans 3 for Question 1

**Can I simply increase the CONCURRENT_REQUESTS value? without compromise anything? **

Related Questions