I want to scrape the TOP TEN RELEASES table from Torrenting.com, and I have made a crawler for that purpose, but you first need to be logged in to the site. The initial data that I have scraped was basically nothing, so I started rebuilding mine torrent_spider.py
for that purpose and because I am new to web scraping I am stuck whit this issue.
I am reading the Scrapy docs on this and I have found that start_requests()
will help me connect to torrenting and start scraping for the table.
Mine question is, can someone explain to me how do I return the https://www.torrenting.com/browse.php
page after mine spider is logged in, so I can start scraping the wanted data.
This is torrent_spider.py
:
from scrapy import Spider
from scrapy.selector import Selector
class TorrentSpider(Spider):
""" TorrentSpider who will Scrape the Top Then Relese Table. """
name = "torrenting"
allowed_domains = ["torrenting.com"]
start_urls = [
"https://www.torrenting.com/browse.php",
]
def start_request(self):
return [scrapy.FormRequest("https://www.torrenting.com/login.php?returnto=Login",
formdata={'user': 'example', 'pass': 'somepass'},
callback = self.logged_in)
def logged_in(self, response):
pass
def parse(self, response):
pass