I am using a CrawlSpider that recursively follow links calling the next page using a link extraction like:

rules = (Rule(LinkExtractor(

I have applied this strategy to recursively crawl different websites, and as far as there was text in the html tag, like <a href="somelink">sometext</a>, everything worked fine.

I am now trying to scrape a website that has an

<div class="bui-pagination__item bui-pagination__next-arrow"> <a class="pagenext" href="/url.html" aria-label="Pagina successiva"> <svg class="bk-icon -iconset-navarrow_right bui-pagination__icon" height="18" role="presentation" width="18" viewBox="0 0 128 128"> <path d="M54.3 96a4 4 0 0 1-2.8-6.8L76.7 64 51.5 38.8a4 4 0 0 1 5.7-5.6L88 64 57.2 94.8a4 4 0 0 1-2.9 1.2z"></path> </svg> </a> </div>

as a 'next' button instead of simple text, and my LinkExtractor rule does not seem to apply anymore, and the spider stops after the first page.

I have tried to look for the svg element, but that doesn't seem to trigger the extraction:

restrict_xpaths=('//a[contains(.,name()=svg) and contains(@class,"nextpageclass")]'))

Is there anything I am missing?

0 Answers