Scrapy: yield form request prints none?

1.9k views Asked by At

I am writing a spider to scrap website:

First url www.parenturl.com calls parse function, from there i have extracted url www.childurl.com which i have a callback to parse2 function and it returns dict.

Question 1) I need to store the dict value in mysql database with other 7 values which I have extracted from parent url in parse function ? (response_url prints none)

def parse(self, response):
    for i in range(0,2):
        url = response.xpath('//*[@id="response"]').extract()
        response_url=yield SplashFormRequest(url,method='GET',callback=self.parse2)
        print response_url # prints None

def parse2(self, response):
    dict = {'url': response.url}
    return dict
2

There are 2 answers

0
elacuesta On BEST ANSWER

Storing the result of the second callback on the spider object and then printing it is not guaranteed to work because of scrapy's asynchronous nature. Instead, you could try passing additional data to callback functions, something like:

def parse(self, response):
    for i in range(0, 2):
        item = ...  # extract some information
        url = ...  # construct URL
        yield SplashFormRequest(url, callback=self.parse2, meta={'item': item})

def parse2(self, response):
    item = response.meta['item']  # get data from previous parsing method
    item.update({'key': 'value'})  # add more information
    print item  # do something with the "complete" item
    return item
7
Tobey On

Your cannot equate a yield call to a variable because it acts like a return call.

Try removing it

def parse(self, response):
    self.results = []
    for i in range(0,2):
        url = response.xpath('//*[@id="response"]').extract()
        request = SplashFormRequest(url,method='GET',callback=self.parse2)
        yield request
    print self.results

def parse2(self, response):
    # print response here !
    dict = {'url': response.url}
    self.results.append(dict)