Unable to load JavaScript and got pyppeteer error from webpage with requests

226 views Asked by At

I'm trying to scrape a webpage after login.

If I use only BeautifulSoup and requests I get

Please enable JavaScript to continue using this application.

So, I decided to use html_requests with the following code:

from requests_html import HTMLSession

session = HTMLSession()

session.get(url)
session.post(loginUrl, data = {"email":"[email protected]", "password": "Pass123"})


resp.html.render()

But I get the same error or:

pyppeteer.errors.PageError: net::ERR_SSL_VERSION_OR_CIPHER_MISMATCH

So I decided to use selenium, even though I really prefer to use request due to higher script speed.

When I use selenium, it works fine, but when I load the selenium's page source into BeautifulSoup, I again get the

Please enable JavaScript to continue using this application.

error page.

Why? On driver is loaded fine and I just parse the HTML page from selenium.

How can I fix both the requests_html and BeautifulSoup errors?

1

There are 1 answers

5
baduker On BEST ANSWER

You don't really need either pyppeteer or selenium. You can log in using a plain request and get all the data you want.

The key here is to get the accessToken via the Login endpoint and then use it in subsequent requests.

The API calls I'm making here are the meat of the page after logging in. The rest of the HTML is just eye-candy. The data coming from the API corresponds to what you see on the site:

enter image description here

As for the pyppeteer.errors.PageError: net::ERR_SSL_VERSION_OR_CIPHER_MISMATCH, this error is typically caused by an SSL/TLS handshake failure. The server you're trying to connect to may be using an outdated or unsupported SSL/TLS version or cipher suite.

You can read more about the error here.

TL;DR: There's not much you can do about it.

I'd recommend using my approach (no browser, just API calls).

Benefits of the following approach:

  • lightweight
  • relatively fast
  • no SSL errors
  • full data

Here's how you can get the sale data:

import requests
from dateutil.parser import parse

login_url = "https://api-it.saywow.me/it-it/api/Users/Login"
sales_url = "https://api-it.saywow.me/it-it/api/Booking/GetCanBookSaleEvents"
payload = {
    "email": "YOUR_EMAIL",
    "password": "YOUR_PASSWORD",
}


def format_date(date: str) -> str:
    return parse(date).strftime("%d %B")


def show_sales(sales_data: list) -> None:
    for sale in sales_data:
        event = sale["saleEvent"]["saleEventName"]
        address = sale["saleEvent"]["addressFull"]
        start_date = format_date(sale["saleEvent"]["startDate"])
        end_date = format_date(sale["saleEvent"]["endDate"])
        is_booked = sale["isBooked"]

        template = f"""
Event: {event}
Address: {address}
Dates: {start_date} - {end_date}
Booked: {"Yes!" if is_booked else "You can book this event!"}
"""
        print(template)


def main() -> None:
    with requests.Session() as session:
        response = session.post(login_url, json=payload)
        token = response.json()["data"]["accessToken"]
        sales = session.post(
            sales_url,
            headers={"Authorization": f"Bearer {token}"},
        )
        show_sales(sales.json()["data"])


if __name__ == "__main__":
    main()

If you plug in your registration email and a valid password, you should see this:

Event: HOUSE OF LUXURY
Address: Viale John Fitzgerald Kennedy 54, Napoli NA
Dates: 08 December - 17 December
Booked: You can book this event!


Event: Monot Archive Sale
Address: Via Orobia 11, Milano MI
Dates: 28 November - 06 December
Booked: You can book this event!

There's plenty more in the sales_data table, like location, phone numbers, etc.

Here's a sample:

...

"addressName": "Via Orobia",
"addressNumber": "11",
"addressCity": "Milano",
"addressProvince": "MI",
"addressZip": "20139",
"addressCountry": "IT",
"addressLat": 45.4426322,
"addressLon": 9.2056631,

...