Retrieve html code of skyscanner result using phantomjs

633 views Asked by At

As it happens that skyscanner only provides their api to big commercial websites, I wanted to build a small application on my own to retrieve the results for multiple destinations for my own purpose (non commercial).

I found that getting the result of a flight search seems to be pretty difficult as the page takes a few seconds to complete the flight search and display the results.

Using wget, lynx, links2 or edbrowse didn't work for me, as I got the result that javascript is not enabled in my browser, even when links2 was compiled with javascript support. Maybe I did something wrong, I don't know.

However phantomjs provided the best effort so far and I tried multiple code-fragments to retrieve the flight search results.

Sources from:

[Stackoverflow#1][1]
[Stackoverflow#2][2]
[Techslides][3]
[Stackoverflow#3][4]
[Stackoverflow#4][5]

  [1]: http://stackoverflow.com/questions/18526140/how-to-get-html-generated-from-javascript-using-phantomjs
  [2]: http://stackoverflow.com/questions/28209509/get-javascript-rendered-html-source-using-phantomjs
  [3]: http://techslides.com/grabbing-html-source-code-with-phantomjs-or-casperjs
  [4]: http://stackoverflow.com/questions/12450868/how-to-print-html-source-to-console-with-phantomjs
  [5]: http://stackoverflow.com/questions/8692038/phantomjs-page-dump-script-issue

Even with the time lag described in [Stackoverflow#4][5] it did not work. The scripts resulted (in case of a successful return) only an error page of skyscanner, saying that they got a problem.

The last effort I tried which resulted in the described error-page was:

var page = new WebPage(),t, address;
var fs = require('fs');

var url = 'http://www.skyscanner.at/transport/fluge/nyca/lax/150626/150627/flugpreise-von-new-york-nach-los-angeles-international-im-juni-2015.html?adults=1&children=0&infants=0&cabinclass=economy&rtn=1&preferdirects=false&outboundaltsenabled=false&inboundaltsenabled=false';

address = encodeURI(url);
page.open(address, function (status) {
    if (status !== 'success') {
        console.log('FAIL to load the address');
    } else {
        f = null;
        var markup = page.content;
        console.log(markup);
        try {
        f = fs.open('htmlcode.txt', "w");
        f.write(markup);
        f.close();          
        } catch (e) {
            console.log(e);
        }
    }   
    phantom.exit();

});

Did someone try something like that before and was successful? How did you get it working? I am trying to build a php-based and/or shell-script based solution on a gui-less Debian-Linux system.

1

There are 1 answers

2
iain On

I work in engineering at Skyscanner. This isn't an answer to your question but, if you end up on that error page (or a captcha page), it is likely that our bot-blocker is catching you. Which is kind of "by design" :)

I can get you an API key, with a conservative rate limit. Would that be of interest?

Cheers,

Iain