Nightmare, PhantomJS and extracting page data

4.3k views Asked by At

I'm new to Nightmare/PhantomJS and am struggling to get a simple inventory of all the tags on a given page. I'm running on Ubuntu 14.04 after building PhantomJS from source and installing NodeJS, Nightmare and so forth manually, and other functions seem to be working as I expect.

Here's the code I'm using:

var Nightmare = require('nightmare');
new Nightmare()
  .goto("http://www.google.com")
  .wait()
  .evaluate(function () 
   {
     var a = document.getElementsByTagName("*");
     return(a);
   }, 
   function(i) 
   {
     for (var index = 0; index < i.length; index++)
     if (i[index])
        console.log("Element " + index + ": " + i[index].nodeName);
    })
  .run(function(err, nightmare) 
  {
     if (err) 
        console.log(err);
  }); 

When I run this inside a "real" browser, I get a list of all the tag types on the page (HTML, HEAD, BODY, ...). When I run this using node GetTags.js, I just get a single line of output:

Element 0: HTML

I'm sure it's a newbie problem, but what am I doing wrong here?

1

There are 1 answers

5
Artjom B. On BEST ANSWER

PhantomJS has two contexts. The page context which provides access to the DOM can only be accessed through evaluate(). So, variables must be explicitly passed in and out of the page context. But there is a limitation (docs):

Note: The arguments and the return value to the evaluate function must be a simple primitive object. The rule of thumb: if it can be serialized via JSON, then it is fine.

Closures, functions, DOM nodes, etc. will not work!

Nightmare's evaluate() function is only a wrapper around the PhantomJS function of the same name. This means that you will need to work with the elements in the page context and only pass a representation to the outside. For example:

.evaluate(function () 
{
    var a = document.getElementsByTagName("div");
    return a.length;
}, 
function(i) 
{
    console.log(i + " divs available");
})