How to run PhantomJS as a server and call it remotely?

12.5k views Asked by At

This is probably a very basic question. I would like to run a headless browser PhantomJS as a server but not as a command line tool.

Once it is running I would like to call it remotely over HTTP. The only thing I need is to send a URL and get back the HTML output. I need it to generate HTML for an AJAX application to make it searchable.

Is it possible ?

2

There are 2 answers

2
Artjom B. On BEST ANSWER

You can run PhantomJS perfectly fine as a webserver, because it has the Web Server Module. The examples folder contains for example a server.js example. This runs standalone without any dependencies (without node).

var page = require('webpage').create(),
    server = require('webserver').create();

var service = server.listen(port, function (request, response) {
    console.log('Request received at ' + new Date());
    // TODO: parse `request` and determine where to go
    page.open(someUrl, function (status) {
        if (status !== 'success') {
            console.log('Unable to post!');
        } else {
            response.statusCode = 200;
            response.headers = {
                'Cache': 'no-cache',
                'Content-Type': 'text/plain;charset=utf-8'
            };
            // TODO: do something on the page and generate `result`
            response.write(result);
            response.close();
        }
    });
});

If you want to run PhantomJS through node.js then this is also easily doable using the phantomjs-node which is a PhantomJS bridge for node.

var http = require('http');
var phantom = require('phantom');

phantom.create(function (ph) {
  ph.createPage(function (page) {
    http.createServer(function (req, res) {
      // TODO: parse `request` and determine where to go
      page.open(someURL, function (status) {
        res.writeHead(200, {'Content-Type': 'text/plain'});
        // TODO: do something on the page and generate `result`
        res.end(result);
      });
    }).listen(8080);
  });
});

Notes

You can freely use this as is as long you don't have multiple requests at the same time. If you do, then you either need to synchronize the requests (because there is only one page object) or you need to create a new page object on every request and close() it again when you're done.

3
Jay Bhagat On

The easiest way is to make a python script or something simple to start the server and use python websockets to communicate with it, using a web form of sorts to query for a website and get the page source. Any automation can be done via cron jobs, or if you are on Windows, you may use the Tasks feature to autostart the python script.