TripAdvisor Web Scraping a specific value is not possible

Question

TripAdvisor Web Scraping a specific value is not possible

68 views Asked by NuraX At 12 February 2024 at 19:38

I'm new to web scraping and want to get a specific value from a few specific TripAdvisor sites like this I need the value for cleanliness which is 4,5 in this example. No matter what part of the HTML I try, it's not able to get it. On sites like booking or holidaycheck it works like a charm.

Value needed is 4,5

import requests
from lxml import html
import time

url = 'https://www.tripadvisor.com/Hotel_Review-g187399-d200757-Reviews-Pullman_Dresden_Newa_Hotel-Dresden_Saxony.html'

headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'Sec-Ch-Ua': '"Not A(Brand";v="99", "Google Chrome";v="121", "Chromium";v="121"',
    'Sec-Ch-Ua-Mobile': '?0',
    'Sec-Ch-Ua-Platform': '"Windows"',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36'
}

time.sleep(5)

response = requests.get(url, headers=headers)
tree = html.fromstring(response.content)


```cleanliness_xpath = "//div[@class='uqMDf z BGJxv YGfmd YQkjl']//div[@class='ZPHZV']//div[@class='tJRnI']/span[contains(text(), 'Cleanliness')]/following-sibling::div[@class='BqYzr']/span[@class='MUlry']"  

cleanliness_element = tree.xpath(cleanliness_xpath)

# Überprüfen, ob ein Wert gefunden wurde
if cleanliness_element:
    cleanliness_rating = float(cleanliness_element[0].text) / 10  
    print(f"Cleanliness rating: {cleanliness_rating}")
else:
    print("Cleanliness rating not found")

Original Q&A

There are 1 answers

**Varun** · Answer 1 · 2024-02-22T04:58:23+00:00

To get the cleanliness value you need to change the cleanliness_xpath value.

cleanliness_xpath = "//div[contains(@data-tab, 'TABS_ABOUT')]//span[contains(., 'Cleanliness')]/following-sibling::span"

For me the above xpath worked fine.

Explanation:

//div[contains(@data-tab, 'TABS_ABOUT')] - This will go to the About section in the web page
//span[contains(., 'Cleanliness')] - This will go to the Cleanliness rating line

HTML code block of Cleanliness rating line

<div class="tJRnI">
        <span>Cleanliness</span>
        <div class="BqYzr">
                <div class="WXMiS" style="width:89.457368px"></div>
        </div>
        <span class="MUlry">4.5</span>
</div>

As per the above html code, after Cleanliness we can find the rating

/following-sibling::span - This will find the span tag which is under the current parent and in the same level with the current tag
//div[contains(@data-tab, 'TABS_ABOUT')]//span[contains(., 'Cleanliness')]/following-sibling::span - This will go to the About section, then to the span tag which contains Cleanliness string and checks for the span tag which is in the same level as current.

Below is the output i got. It's not 4.5 as you are dividing the rating by 10.

Cleanliness rating: 0.45

TechQA.

TripAdvisor Web Scraping a specific value is not possible

There are 1 answers

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in PYTHON-REQUESTS

Related Questions in LXML

Popular Questions

Trending Questions