Proper way of contrusting scrapy start_requests()

344 views Asked by copser At 10 June 2015 at 05:01

I want to scrape the TOP TEN RELEASES table from Torrenting.com, and I have made a crawler for that purpose, but you first need to be logged in to the site. The initial data that I have scraped was basically nothing, so I started rebuilding mine torrent_spider.py for that purpose and because I am new to web scraping I am stuck whit this issue.

I am reading the Scrapy docs on this and I have found that start_requests() will help me connect to torrenting and start scraping for the table.

Mine question is, can someone explain to me how do I return the https://www.torrenting.com/browse.php page after mine spider is logged in, so I can start scraping the wanted data.

This is torrent_spider.py:

from scrapy import Spider
from scrapy.selector import Selector


class TorrentSpider(Spider):
    """ TorrentSpider who will Scrape the Top Then Relese Table. """
    name = "torrenting"
    allowed_domains = ["torrenting.com"]
    start_urls = [
        "https://www.torrenting.com/browse.php",
    ]

    def start_request(self):
        return [scrapy.FormRequest("https://www.torrenting.com/login.php?returnto=Login",
                                    formdata={'user': 'example', 'pass': 'somepass'},
                                    callback = self.logged_in)

    def logged_in(self, response):
        pass


    def parse(self, response):
        pass

Original Q&A

TechQA.

Proper way of contrusting scrapy start_requests()

There are 0 answers

Related Questions in PYTHON

Related Questions in SCRAPY

Popular Questions

Popular Tags

Trending Questions