Scrapy: how to get information from all tabs on the page?

Question

Scrapy: how to get information from all tabs on the page?

234 views Asked by Alex At 25 December 2024 at 22:53

On this page I need to get information from all tabs(Profile, Reviews, Phone Numbers & Directions).

wellness.py

def profile(self, response):
    services = response.xpath('.//span[contains(text(),"Services")]')
    education = response.xpath('.//span[contains(text(),"Education")]')
    training = response.xpath('.//span[contains(text(),"Training")]')

    yield {
            'First and Last name': response.css('h1::text').get(),
            'About': response.css('.listing-about::text').get(),
            'Services': services.xpath('following-sibling::span[1]/text()').extract(),
            'Primary Specialty': response.css('.normal::text').get(),
            'Address': ' '.join([i.strip() for i in response.css('.office-address span::text').getall()]),
            'Practice': response.css('.years-in-service::text').get(),
            'Education': education.xpath('following-sibling::span[1]/text()').extract(),
            'Training': training.xpath('following-sibling::span[1]/text()').extract(),
            'Consumer Feedback': response.css('.item-rating-container a::text').get()                
        }

Original Q&A

There are 1 answers

**ThePyGuy** · Accepted Answer · 2020-03-09T15:49:19+00:00

Each tab is loading a separate page/url. I think you thought since it was tabbed it was the same page. So you will have to collect the data you want off the first page, request the 2nd page get data, and request the 3rd page. You keep the data from the previous page by passing item in the meta attributes. This is how I would do it. Please note the code for the links is correct you will have to make the selectors for the data points on each page.

def profile(self, response):
    item = {}
    item["field1"] = response.xpath('//xpath').get()
    # Get first link for reviews
    review_link = response.css('#reviews_tab a::attr(href)').get()
    yield scrapy.Request(response.urljoin(review_link), callback=self.parse_reviews, meta={'item': item})

def parse_reviews(self, response):
    item = response.meta['item']
    item["field2"] = response.xpath
    directions_link = response.css('#directions_tab a:attr(href)').get()
    yield scrapy.Request(response.urljoin(directions_link), callback=self.parse_directions, meta={'item': item})

def parse_directions(self, response):
    item = response.meta['item']
    item['directions'] = response.xpath
    yield item

TechQA.

Scrapy: how to get information from all tabs on the page?

There are 1 answers

Related Questions in HTML

Related Questions in WEB

Related Questions in DOM

Related Questions in SCRAPY

Popular Questions

Popular Tags

Trending Questions