How do you access the unaltered source of a page via phantomjs

Question

How do you access the unaltered source of a page via phantomjs

137 views Asked by MyStream At 26 June 2015 at 06:42

Using phantomjs, it's possible to get access to a copy of the modified DOM, post-parsing. Using a cURL call you can get access to the page pre-parsing. In the pre-parsed code, you may find errors which are corrected by a browser.

How do you get access to both the post-rendered changes and the pre-rendered content to make a comparison of the fixes the browser does automatically?

Is the best method to use DIFF on the two files or does phantomjs hold two copies of the content, the original and the modified forms? I can't seem to find the right way to phrase this to get an answer via google and a search here: https://stackoverflow.com/search?q=[phantomjs]+save+unaltered+source didn't turn up any results.

I'd like to avoid a second call to the same page for bandwidth/efficiency reasons.

Original Q&A

There are 1 answers

**Artjom B.** · Accepted Answer · 2015-06-26T13:04:48+00:00

There is no way to directly access the unaltered source (referred to as view-source in other browsers) in PhantomJS.

You could try to read the page from the PhantomJS cache (when run with the --disk-cache=true option), but there is an easier method. You can simply sent an AJAX request to get the source "on the wire", but then you would need to handle redirect yourself.

var page = require('webpage').create(),
    fs = require('fs');

function get(page, url) {
    return page.evaluate(function(url){
        var xhr = new XMLHttpRequest();
        xhr.open('GET', url, false);
        xhr.send(null);
        return xhr.responseText;
    }, url);
}

var url = 'http://example.com';

page.open(url, function(){
    var co = get(page, url);
    fs.write("original.html", co);
    fs.write("rendered.html", page.content);
    phantom.exit();
});

You can already see with this simple script that the two files are different despite not involving JavaScript.

enter image description here

You might need to run with the --web-security=false option. Instead of passing the url into the get() function, you may directly access page.url:

function get(page, url) {
    url = url || page.url;
    return page.evaluate(function(url){
        var xhr = new XMLHttpRequest();
        xhr.open('GET', url, false);
        xhr.send(null);
        return xhr.responseText;
    }, url);
}

TechQA.

How do you access the unaltered source of a page via phantomjs

There are 1 answers

Related Questions in JAVASCRIPT

Related Questions in CURL

Related Questions in PHANTOMJS

Related Questions in DIFF

Popular Questions

Popular Tags

Trending Questions