I want to parse data from https://announcements.bybit.com/en-US/?category=new_crypto&page=1, I need JS to be executed (this is the condition in the test case). I use this code:
from requests_html import HTMLSession
from fake_useragent import UserAgent
from bs4 import BeautifulSoup as BS
ua = UserAgent()
headers = {
'User-Agent': ua.random,
'Accept': '*/*'
}
url = "https://announcements.bybit.com/en-US/?category=new_crypto&page=1"
session = HTMLSession()
r = session.get(url, headers=headers)
r.html.render()
Without html.render it works without errors, returning the following:
But when I try to render JS the following exception is raised:
Traceback (most recent call last):
File "/home/anlucka/PycharmProjects/test_parse/test_parse/main.py", line 16, in <module>
r.html.render()
File "/home/anlucka/PycharmProjects/test_parse/.venv/lib/python3.10/site-packages/requests_html.py", line 598, in render
content, result, page = self.session.loop.run_until_complete(self._async_render(url=self.url, script=script, sleep=sleep, wait=wait, content=self.html, reload=reload, scrolldown=scrolldown, timeout=timeout, keep_page=keep_page))
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/anlucka/PycharmProjects/test_parse/.venv/lib/python3.10/site-packages/requests_html.py", line 512, in _async_render
await page.goto(url, options={'timeout': int(timeout * 1000)})
File "/home/anlucka/PycharmProjects/test_parse/.venv/lib/python3.10/site-packages/pyppeteer/page.py", line 831, in goto
raise PageError(result)
pyppeteer.errors.PageError: net::ERR_SPDY_PROTOCOL_ERROR at https://announcements.bybit.com/en-US/?category=new_crypto&page=1