Achieving Next page through javascript in scrapy python with splash?

Question

Achieving Next page through javascript in scrapy python with splash?

2.6k views Asked by AudioBubble At 20 November 2014 at 07:55

Actually my intension is to achieve the Next from "href="javascript:submitAction_win0(document.win0,'HRS_APPL_WRK_HRS_LST_NEXT')", so Just for an example I am taking [this url][1]. From this url as you can see the Next at the end of the page, so if you observe html of that they are written through href="javascript:submitAction_win0(document.win0,'HRS_APPL_WRK_HRS_LST_NEXT') which has href tags as # , I am just trying to collect that href tags even though they are #.

def parse(self,response):
        selector = Selector(response)
        links = []
        for link in selector.css('span.PSEDITBOX_DISPONLY').re('.*>(\d+)<.*'):
  #intjid = selector.css('span.PSEDITBOX_DISPONLY').re('.*>(\d+)<.*')
                abc = 'xxxx'
                #print abc
  yield Request(abc,callback=self.parse_listing_page,dont_filter=True)
                          #meta={"use_splash": False}
                         # ) 

        nav_page = selector.css('div#win0divHRS_APPL_WRK_HRS_LST_NEXT a').extract()
        print nav_page
 for nav_page in nav_page:
       
     ## To pass the url to parse function
                yield Request(urljoin('xxx',nav_page),self.parse,dont_filter=True)

When I run the above code I am getting the result as " HTTP status code is not handled or not allowed", I mean none, can anyone tell me how to achieve the Next through that ""href="javascript:submitAction_win0(document.win0,'HRS_APPL_WRK_HRS_LST_NEXT')"" functions and why the result is empty. I am observing some kind of wierd in html, for example one of the page in Next has anchor tag as "<a id="HRS_APPL_WRK_HRS_LST_NEXT" class="PSHYPERLINK" href="javascript:submitAction_win0(document.win0,'HRS_APPL_WRK_HRS_LST_NEXT');" tabindex="74" ptlinktgt="pt_replace" name="HRS_APPL_WRK_HRS_LST_NEXT"></a>"

Thanks in advance

output :

[u'<a name="HRS_APPL_WRK_HRS_LST_NEXT" id="HRS_APPL_WRK_HRS_LST_NEXT" ptlinktgt="pt_replace" tabindex="74" href="javascript:submitAction_win0(document.win0,\'HRS_APPL_WRK_HRS_LST_NEXT\');" class="PSHYPERLINK">Next</a>']

Original Q&A

There are 1 answers

**Nima Soroush** · Accepted Answer · 2014-11-20T09:46:23+00:00

Scrapy Doesn't support java script call by itself. But there are a couple of mechanisms that you can use for facing java-script.

Splash - Splash is a javascript rendering service with an HTTP API. It's a lightweight browser with an HTTP API, implemented in Python using Twisted and QT
Scrapyjs - This library provides Scrapy-Javascript integration through two different mechanisms: a Scrapy download handler, a Scrapy downloader middlware
SpiderMonkey - Execute arbitrary JavaScript code from Python. Allows you to reference arbitrary Python objects and functions in the JavaScript VM
spynner - Spynner is a stateful programmatic web browser module for Python. It is based upon PyQT and WebKit. It supports Javascript, AJAX, and every other technology that !WebKit is able to handle (Flash, SVG, ...). Spynner takes advantage of JQuery. a powerful Javascript library that makes the interaction with pages and event simulation really easy

TechQA.

Achieving Next page through javascript in scrapy python with splash?

There are 1 answers

Related Questions in JAVASCRIPT

Related Questions in PYTHON

Related Questions in SCRAPY

Related Questions in SCRAPINGHUB

Popular Questions

Popular Tags

Trending Questions