How to capture the runtime html content/state with all styles applied and javascript removed

2.5k views Asked by At

Shorter version of Question

Is there any way to capture runtime html (current state of dom) with all styles applied /inlined of a dynamic(ajax/javascript) web page?


Longer version

What I would like to do is save current state of dom as single html page with all styles either wrapped in <style> tag applied or styles inlined.

Analogically, What I need is some thing that resembles a coredump file (which gives current state of application) but in this case a (html) file which could be loaded into the borwser to view/debug.

This task would be pretty easy if there is no javascript in the page.

  • Do File->Save Page As -> html complete from web Browser
  • Or Use tools like http://www.httrack.com or curl to download page and all linked images

Following lists it in high level what could be done to do the same programatically (though not complete solution)

  • get the html content
  • remove all tags and onXXX attributes from each like onclick
  • get *.css contents
  • embed css styles in within <style>
  • change all image paths to relative
  • save all images

But when javascript is used to build the page or if state of page is altered on load / click like some elements made hidden and layout of page is altered, above methods could not be used.

So how to save the current state of a (dynamic javascript enabled) webpage ?

If such a method / tool /plugin exists then it would be handy to email/share the page with someone who does not have the access to the internet or web application

2

There are 2 answers

0
Artemiy On

So I am assuming you want to do it on the desktop? Then I would use a preferrably headless browser, like phantomjs, and disable cross-site scripting. Write javascript that will load your target URL into iframe, get its dom and save it to file. You will still need to save all CSS manually, not sure how to inline it.

0
jcdietrich On

As to how you can get the current state of the DOM.

You can make use of jQuery's .html() to get the current DOM.

var DOMState = $('html').html();

You could write a bookmarklet that would inject jquery (see http://blog.reybango.com/2010/09/02/how-to-easily-inject-jquery-into-any-web-page/ for an example) and then capture the html.