TripAdvisor Web Scraping a specific value is not possible

68 views Asked by At

I'm new to web scraping and want to get a specific value from a few specific TripAdvisor sites like this I need the value for cleanliness which is 4,5 in this example. No matter what part of the HTML I try, it's not able to get it. On sites like booking or holidaycheck it works like a charm.

Value needed is 4,5

import requests
from lxml import html
import time

url = 'https://www.tripadvisor.com/Hotel_Review-g187399-d200757-Reviews-Pullman_Dresden_Newa_Hotel-Dresden_Saxony.html'

headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'Sec-Ch-Ua': '"Not A(Brand";v="99", "Google Chrome";v="121", "Chromium";v="121"',
    'Sec-Ch-Ua-Mobile': '?0',
    'Sec-Ch-Ua-Platform': '"Windows"',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36'
}

time.sleep(5)

response = requests.get(url, headers=headers)
tree = html.fromstring(response.content)


```cleanliness_xpath = "//div[@class='uqMDf z BGJxv YGfmd YQkjl']//div[@class='ZPHZV']//div[@class='tJRnI']/span[contains(text(), 'Cleanliness')]/following-sibling::div[@class='BqYzr']/span[@class='MUlry']"  

cleanliness_element = tree.xpath(cleanliness_xpath)

# Überprüfen, ob ein Wert gefunden wurde
if cleanliness_element:
    cleanliness_rating = float(cleanliness_element[0].text) / 10  
    print(f"Cleanliness rating: {cleanliness_rating}")
else:
    print("Cleanliness rating not found")

1

There are 1 answers

0
Varun On

To get the cleanliness value you need to change the cleanliness_xpath value.

cleanliness_xpath = "//div[contains(@data-tab, 'TABS_ABOUT')]//span[contains(., 'Cleanliness')]/following-sibling::span"

For me the above xpath worked fine.

Explanation:

//div[contains(@data-tab, 'TABS_ABOUT')] - This will go to the About section in the web page
//span[contains(., 'Cleanliness')] - This will go to the Cleanliness rating line

HTML code block of Cleanliness rating line

<div class="tJRnI">
        <span>Cleanliness</span>
        <div class="BqYzr">
                <div class="WXMiS" style="width:89.457368px"></div>
        </div>
        <span class="MUlry">4.5</span>
</div>

As per the above html code, after Cleanliness we can find the rating

/following-sibling::span - This will find the span tag which is under the current parent and in the same level with the current tag
//div[contains(@data-tab, 'TABS_ABOUT')]//span[contains(., 'Cleanliness')]/following-sibling::span - This will go to the About section, then to the span tag which contains Cleanliness string and checks for the span tag which is in the same level as current.

Below is the output i got. It's not 4.5 as you are dividing the rating by 10.

Cleanliness rating: 0.45