I'm trying to scrape data off of a website. There are certain datetimes that I'm interested but there are two problems:
They're not present on a site in the original HTML, they're loaded later.
When loaded, they're displayed in relative and imprecise human-readable form. So 7.3.2023 14:22 becomes 'in seven days'. So simply waiting for the page to finish loading is a no-go as well.
When I open the Network panel in Chrome Dev Tools, I can pinpoint a request that sends over data in the proper form.
Is there a way to programmatically access the content of these requests using headless Chrome or some other software? The best case would be using a tool from the PHP ecosystem but I guess going with javascript or something else is possible too, just inconvenient.
And no, I can't access the URL of the request directly. The webpage sends a ton of data I can't reasonably reproduce, not to mention there surely gonna be security preventing access from other origins than the original site.
Alright, this was quite a journey but I managed to nail it down.
The solution uses this excellent library: https://github.com/jakubkulhan/chrome-devtools-protocol
Here's my code: