Actually my intension is to achieve the Next from "href="javascript:submitAction_win0(document.win0,'HRS_APPL_WRK_HRS_LST_NEXT')"
, so Just for an example I am taking [this url][1]. From this url as you can see the Next at the end of the page, so if you observe html of that they are written through href="javascript:submitAction_win0(document.win0,'HRS_APPL_WRK_HRS_LST_NEXT')
which has href
tags as #
, I am just trying to collect that href tags even though they are #
.
def parse(self,response):
selector = Selector(response)
links = []
for link in selector.css('span.PSEDITBOX_DISPONLY').re('.*>(\d+)<.*'):
#intjid = selector.css('span.PSEDITBOX_DISPONLY').re('.*>(\d+)<.*')
abc = 'xxxx'
#print abc
yield Request(abc,callback=self.parse_listing_page,dont_filter=True)
#meta={"use_splash": False}
# )
nav_page = selector.css('div#win0divHRS_APPL_WRK_HRS_LST_NEXT a').extract()
print nav_page
for nav_page in nav_page:
## To pass the url to parse function
yield Request(urljoin('xxx',nav_page),self.parse,dont_filter=True)
When I run the above code I am getting the result as " HTTP status code is not handled or not allowed"
, I mean none, can anyone tell me how to achieve the Next through that ""href="javascript:submitAction_win0(document.win0,'HRS_APPL_WRK_HRS_LST_NEXT')""
functions and why the result is empty. I am observing some kind of wierd in html, for example one of the page in Next has anchor tag as "<a id="HRS_APPL_WRK_HRS_LST_NEXT" class="PSHYPERLINK" href="javascript:submitAction_win0(document.win0,'HRS_APPL_WRK_HRS_LST_NEXT');" tabindex="74" ptlinktgt="pt_replace" name="HRS_APPL_WRK_HRS_LST_NEXT"></a>"
Thanks in advance
output :
[u'<a name="HRS_APPL_WRK_HRS_LST_NEXT" id="HRS_APPL_WRK_HRS_LST_NEXT" ptlinktgt="pt_replace" tabindex="74" href="javascript:submitAction_win0(document.win0,\'HRS_APPL_WRK_HRS_LST_NEXT\');" class="PSHYPERLINK">Next</a>']
Scrapy Doesn't support java script call by itself. But there are a couple of mechanisms that you can use for facing java-script.