Scenario :
I have a target website that I need to crawl and take a screenshot of the personal account feed.
Needs :
- Login to the website.
- Browse to the personal area.
- crawl the page.
Code :
require 'vendor/autoload.php';
use JonnyW\PhantomJs\Client;
$client = Client::getInstance();
$client->getEngine()->setPath('C:\xampp\htdocs\phantomjs\bin\phantomjs.exe');
$client->getProcedureCompiler()->clearCache();
$client->isLazy();
$delay = 15; // 5 seconds
$width = 1366;
$height = 768;
$top = 0;
$left = 0;
$request = $client->getMessageFactory()->createCaptureRequest();
$response = $client->getMessageFactory()->createResponse();
$request->setDelay($delay);
$request->setTimeout(10000);
$data = array(
'login' => '***',
'password' => '***',
);
$request->setMethod('POST');
$request->setUrl('login-url');
$request->setRequestData($data); // Set post data
$request->setOutputFile('screenshot.jpg');
$request->setViewportSize($width, $height);
$request->setCaptureDimensions($width, $height, $top, $left);
$client->send($request, $response);
$file = fopen("1.txt","a");
fwrite($file,$response->getContent());
fclose($file);
Question :
How to browse to the personal page URL without loosing the cookies and the session ?
I have already tried to only change the setUrl another time on the same request, but it's not working.
$request->setMethod('GET');
$request->setUrl('personal-page-url');
$request->setOutputFile('screenshot1.jpg');
$client->send($request, $response);
$file = fopen("2.txt","a");
fwrite($file,$response->getContent());
fclose($file);
According to this issue on github , there is still an unfixed problem with cookies. you can follow it.
Or you can use other ways of scrapping if your target webpage doesn't have so much ajax data transmission like:
if you realy need js to be run you can use other web-drivers for php