I'm trying to scrape a webpage after login.
If I use only BeautifulSoup and requests I get
Please enable JavaScript to continue using this application.
So, I decided to use html_requests
with the following code:
from requests_html import HTMLSession
session = HTMLSession()
session.get(url)
session.post(loginUrl, data = {"email":"[email protected]", "password": "Pass123"})
resp.html.render()
But I get the same error or:
pyppeteer.errors.PageError: net::ERR_SSL_VERSION_OR_CIPHER_MISMATCH
So I decided to use selenium, even though I really prefer to use request due to higher script speed.
When I use selenium, it works fine, but when I load the selenium's page source into BeautifulSoup, I again get the
Please enable JavaScript to continue using this application.
error page.
Why? On driver is loaded fine and I just parse the HTML page from selenium.
How can I fix both the requests_html
and BeautifulSoup
errors?
You don't really need either pyppeteer or selenium. You can log in using a plain
request
and get all the data you want.The key here is to get the
accessToken
via theLogin
endpoint and then use it in subsequent requests.The API calls I'm making here are the meat of the page after logging in. The rest of the HTML is just eye-candy. The data coming from the API corresponds to what you see on the site:
As for the
pyppeteer.errors.PageError: net::ERR_SSL_VERSION_OR_CIPHER_MISMATCH
, this error is typically caused by an SSL/TLS handshake failure. The server you're trying to connect to may be using an outdated or unsupported SSL/TLS version or cipher suite.You can read more about the error here.
TL;DR: There's not much you can do about it.
I'd recommend using my approach (no browser, just API calls).
Benefits of the following approach:
Here's how you can get the sale data:
If you plug in your registration email and a valid password, you should see this:
There's plenty more in the
sales_data
table, like location, phone numbers, etc.Here's a sample: