When to use NodeIterator

5.2k views Asked by At

Benchmark compares QSA & .forEach vs a NodeIterator

toArray(document.querySelectorAll("div > a.klass")).forEach(function (node) {
  // do something with node
});

var filter = {
    acceptNode: function (node) {
        var condition = node.parentNode.tagName === "DIV" &&
            node.classList.contains("klass") &&
            node.tagName === "A";

        return condition ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT
    }  
}
// FIREFOX Y U SUCK
var iter = document.createNodeIterator(document, NodeFilter.SHOW_ELEMENT, filter, false);
var node;
while (node = iter.nextNode()) {
    // do thing with node    
}

Now either NodeIterator's suck or I'm doing it wrong.

Question: When should I use a NodeIterator ?

In case you don't know, DOM4 specifies what NodeIterator is.

2

There are 2 answers

0
gsnedders On BEST ANSWER

It's slow for a variety of reasons. Most obviously is the fact that nobody uses it so quite simply far less time has been spent optimizing it. The other problem is it's massively re-entrant, every node having to call into JS and run the filter function.

If you look at revision three of the benchmark, you'll find I've added a reimplementation of what the iterator is doing using getElementsByTagName("*") and then running an identical filter on that. As the results show, it's massively quicker. Going JS -> C++ -> JS is slow.

Filtering the nodes entirely in JS (the getElementsByTagName case) or C++ (the querySelectorAll case) is far quicker than doing it by repeatedly crossing the boundary.

Note also selector matching, as used by querySelectorAll, is comparatively smart: it does right-to-left matching and is based on pre-computed caches (most browsers will iterate over a cached list of all elements with the class "klass", check if it's an a element, and then check if the parent is a div) and hence they won't even bother with iterating over the entire document.

Given that, when to use NodeIterator? Basically never in JavaScript, at least. In languages such as Java (undoubtedly the primary reason why there's an interface called NodeIterator), it will likely be just as quick as anything else, as then your filter will be in the same language as the filter. Apart from that, the only other time it makes sense is in languages where the memory usage of creating a Node object is far greater than the internal representation of the Node.

7
Gust van de Wal On

NodeIterator (and TreeWalker, for that matter) are almost never used, because of a variety of reasons. This means that information on the topic is scarce and answers like @gsnedders' come to be, which fail to mention any of the API's unique features. I know this question is almost a decade old, so excuse my necromancy.

1. Initiation & Performance

It is true that the initiation of a NodeIterator is way slower than a method like querySelectorAll, but that is not the performance you should be measuring.

The thing about NodeIterators is that they are live-ish in the way that, just like an HTMLCollection or live NodeList, you can keep using the object after initiating it once.
The NodeList returned by querySelectorAll is static and will have to be re-initiated every time you need to match newly added elements.

This version of the jsPerf puts the NodeIterator in the preparation code. The actual test only tries to loop over all newly added elements with iter.nextNode(). You can see that the iterator is now orders of magnitudes faster.

2. Selector performance

Okay, cool, caching the iterator after initiation makes this example faster than querying. How you use the API still matters for iteration speed, though. In this version, you can observe another significant difference. I've added 10 classes (done[0-9]) that the selectors shouldn't be matching. The iterator loses about 10% of its speed, while the querySelectors lose 20%.

This version, on the other hand, shows what happens when you add another div > at the start of the selector. The iterator loses 33% of its speed, while the querySelectors got a speed INCREASE of 10%.

Removing the initial div > at the start of the selector like in this version shows that both methods become slower, because they match more than earlier versions. Like expected, the iterator is relatively more performant than the querySelectors in this case.

This means that filtering on basis of a node's own properties (its classes, attributes, etc.) is probably faster in a NodeIterator, while having a lot of combinators (>, +, ~, etc.) in your selector probably means querySelectorAll is faster.
This is especially true for the  (space) combinator. Selecting elements with querySelectorAll('article a') is way easier than manually looping over all parents of every a element, looking for one that has a tagName of 'ARTICLE'.

P.S. in §3.2, I give an example of how the exact opposite can be true if you want the opposite of what the space combinator does (exclude a tags with an article ancestor).

3. Impossible selectors

3.1 Simple hierarchical relationships

Of course, manually filtering elements gives you practically unlimited control. This means that you can filter out elements that would normally be impossible to match with CSS selectors. For example, CSS selectors can only "look back" in the way that selecting divs that are preceded by another div is possible with div + div. Selecting divs that are followed by another div is impossible.

However, inside a NodeFilter, you can achieve this by checking node.nextElementSibling.tagName === 'DIV'. The same goes for every selection CSS selectors can't make.

3.2 More global hierarchical relationships

Another thing I personally love about the usage of NodeFilters, is that when passed to a TreeWalker, you can reject a node and its whole sub-tree by returning NodeFilter.FILTER_REJECT instead of NodeFilter.FILTER_SKIP.

Imagine you want to iterate over all a tags on the page, except for ones with an article ancestor.
With querySelectors, you'd type something like

let a = document.querySelectorAll('a')
a = Array.prototype.filter.call(a, function (node) {
  while (node = node.parentElement) if (node.tagName === 'ARTICLE') return false
  return true
})

While in a NodeFilter, you'd only have to type this

return node.tagName === 'ARTICLE' ? NodeFilter.FILTER_REJECT : // ✨ Magic happens here ✨
       node.tagName === 'A'       ? NodeFilter.FILTER_ACCEPT :
                                    NodeFilter.FILTER_SKIP

In conclusion

NodeIterators and TreeWalkers shouldn't be instantiated for loads of loops and definitely shouldn't replace one-off loops. For all intents and purposes, they are just alternative methods for keeping track of lists/trees of nodes, with the latter also having a handy bit of sugar added with FILTER_REJECT.

There's one main advantage both NodeIterators and TreeWalkers have to offer:

  • Live-ishness, as discussed in §1

However, don't use them when any of the following is true:

  • Its instance is only going to be used once/a few times
  • Complex hierarchical relationships are queried that are possible with CSS selectors
    (i.e. body.no-js article > div > div a[href^="/"])