Benchmark compares QSA & .forEach vs a NodeIterator
toArray(document.querySelectorAll("div > a.klass")).forEach(function (node) {
  // do something with node
});
var filter = {
    acceptNode: function (node) {
        var condition = node.parentNode.tagName === "DIV" &&
            node.classList.contains("klass") &&
            node.tagName === "A";
        return condition ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT
    }  
}
// FIREFOX Y U SUCK
var iter = document.createNodeIterator(document, NodeFilter.SHOW_ELEMENT, filter, false);
var node;
while (node = iter.nextNode()) {
    // do thing with node    
}
Now either NodeIterator's suck or I'm doing it wrong.
Question: When should I use a NodeIterator ?
In case you don't know, DOM4 specifies what NodeIterator is.
 
                        
It's slow for a variety of reasons. Most obviously is the fact that nobody uses it so quite simply far less time has been spent optimizing it. The other problem is it's massively re-entrant, every node having to call into JS and run the filter function.
If you look at revision three of the benchmark, you'll find I've added a reimplementation of what the iterator is doing using
getElementsByTagName("*")and then running an identical filter on that. As the results show, it's massively quicker. Going JS -> C++ -> JS is slow.Filtering the nodes entirely in JS (the
getElementsByTagNamecase) or C++ (thequerySelectorAllcase) is far quicker than doing it by repeatedly crossing the boundary.Note also selector matching, as used by
querySelectorAll, is comparatively smart: it does right-to-left matching and is based on pre-computed caches (most browsers will iterate over a cached list of all elements with the class "klass", check if it's anaelement, and then check if the parent is adiv) and hence they won't even bother with iterating over the entire document.Given that, when to use NodeIterator? Basically never in JavaScript, at least. In languages such as Java (undoubtedly the primary reason why there's an interface called NodeIterator), it will likely be just as quick as anything else, as then your filter will be in the same language as the filter. Apart from that, the only other time it makes sense is in languages where the memory usage of creating a Node object is far greater than the internal representation of the Node.