how fragile is escape analysis in Hotspot in simple cases such as iterator in for-each loop

664 views Asked by At

Suppose I have a java.util.Collection that I want to loop over. Normally I'd do this:

for(Thing thing : things) do_something_with(thing);

But suppose that this is in some core utility method that is used all over the place, and in most places, the collection is empty. Then ideally we would prefer not impose an iterator allocation on every single caller just to perform a no-op loop, and we could rewrite things like this:

if(things.isEmpty()) return;
for(Thing thing : things) do_something_with(thing);

An even more extreme option, if things is a List, would be to use a C-style for loop.

But wait, Java-escape analysis should eliminate this allocation, at least after the C2 compiler gets around to this method. So there should be no need for this "nano-optimization". (I won't even dignify it with the term micro-optimization; it's a bit too small for that.) Except...

I keep hearing that escape analysis is "fragile" but no one seems to ever talk about what in particular can mess it up. Intuitively, I would think that more complicated control flow would be the main thing to fear, which means iterators in a for-each loop should be RELIABLY eliminated, since there the control flow is simple.

The standard response here is to try running an experiment, but unless I know the variables in play, it's pretty hard to trust any conclusions I might be tempted to draw from such an experiment.

Indeed, here's a blog post where someone tried such an experiment, and 2 out of 3 profilers gave the wrong results:

http://psy-lob-saw.blogspot.com/2014/12/the-escape-of-arraylistiterator.html

I know a lot less about obscure JVM wizardry than the author of that blog post, and am likely to be far more easily misled.

2

There are 2 answers

2
apangin On BEST ANSWER

Scalar Replacement is indeed a kind of optimization that you can never be absolutely certain about, because it depends on too many factors.

First, an allocation may be eliminated only when all uses of the instance are inlined in one compilation unit. In case of an iterator, it means that the iterator constructor, hasNext and next calls (including the nested calls) must be inlined.

public E next() {
    if (! hasNext())
        throw new NoSuchElementException();
    return (E) snapshot[cursor++];
}

However, inlining itself is a fragile optimization in HotSpot, since it relies on many heuristics and limits. For example, it may happen that iterator.next() call is not fully inlined into the loop due to reaching the maximum inlining depth, or because the outer compilation is already too big.

Second, scalar replacement does not happen if a reference conditionally receives different values.

for(Thing thing : things) do_something_with(thing);

In your example, if things is sometimes ArrayList and sometimes Collections.emptyList(), the iterator will be allocated on the heap. For elimination to happen, the type of the iterator must always be the same.

There are more examples in a great talk on Scalar Replacement by Ruslan Cheremin (it's in Russian, but YouTube's subtitles translation feature to the rescue).

Another recommended read is Aleksey Shipilёv's blog post, which also demonstrates how to use JMH to verify whether Scalar Replacement happens in a particular scenario or not.

In short, in simple cases like yours, there is a high chance that allocation elimination will work as expected. There may be some marginal cases though, as I mentioned above.

There is a recent discussion on hotspot-compiler-dev mailing list regarding Partial Escape Analysis proposal. If implemented, it could significantly extend the applicability of Scalar Replacement optimization.

24
rzwitserloot On

Your approach does not work. The correct approach is this:

  • Unless you are a performance expert (which is hard to become), do not make assumptions about what kind of code performs well vs. performs poorly, and maintain skepticism when analysing profiler reports. This is not particularly useful advice (it boils down to: A profiler report may be lying to you!), but it is what it is. Effectively, either be a performance expert, or accept that there's not much you can do about it. Sucks, but, don't shoot the messenger.
  • Write idiomatic java code. It is easiest to maintain and most likely to get optimized by hotspot.
  • Reduction of algorithmic complexity is useful and should always be the first thing you check. To some extent, an optimization that reduces algorithmic complexity gets to ignore the first rule. You do not need to be particularly up to date on the vagaries of JVMTI or Flight Recorder and how profilers work to conclude that an algorithmic rewrite is worthwhile and is going to significantly improve performance.
  • do not trust pithy rules of thumb, no matter how many people are saying it. Do not look for 'easy to apply patterns' like 'replace all foreach loops by appending an if-block that tests for empty first' - these are essentially never correct and usually reduce performance.
  • Be aware that bad performance advice is rampant. You should never treat ubiquitous presence of some argument otherwise bereft of proof or research as "that makes it more likely to be true" as a general principle in life and logical reasoning (it is, after all, a logical fallacy!), but this counts double for performance!

More in-depth musing

Presumably, you're not going to trust the above maxims just because I'm telling you to trust them. I'll try to take you through some falsifiable reasoning to show you why the above maxims are correct.

In particular, this idea of checking for empty first seems extremely misguided.

Let's first translate the overly hyperbolical and therefore rather useless well-known maxim premature optimization is the root of all evil into something more tangible:

Do not make your code an ugly, caveat-ridden mess of weirdness because of an imagined performance issue.

Why can't I go by often-heard maxims?

Do not go by "people" here. Because "people" are notorious for being utterly wrong on performance, time and time again. If you can find widespread, pithy and entirely bereft of proof or research statements that X is good or bad for performance, you can rest assured in the thought that this means absolutely nothing whatsoever. Your average joe twitter writter or whatnot is a clueless idiot in this regard. Proof, ample research, or credentials are an absolute requirement to take things seriously, preferably 2 or 3 of those. There are lists of well known performance falsehoods (commonly held beliefs about how to improve JVM performance that absolutely do not help whatsoever and often actually hurt), and if you then search for these falsehoods you can find entire hordes of folks who espouse it, thus proving that you just cannot trust anything based solely on the fact that you "keep hearing it".

Note also that for just about every imaginable line of java code, you can come up with 100+ plausible if somewhat exotic ideas on how to make the code less obvious but seemingly 'more performant'. Clearly then you can't apply all 100 variants to every line in the entire project, so the road you were planning on taking here ("I do not quite trust that profiler, I find it plausible escape analysis will fail to eliminate this iterator allocation, so, just to be safe I will add an if that checks for empty first"), ends in a disaster where even the simplest task becomes a many-lined, seemingly excessively redundant soup. And performance will be worse on average, so it's a lose-lose scenario.

Here's a simple example to drive the point home, and you can watch those presentations by Doug for more of this sort of thing:

List<String> list = ... retrieve thousands of entries ...;
String[] arr1 = list.toArray(new String[list.size()]);
String[] arr2 = list.toArray(new String[0]);

It is quite plausible that the arr1 line is faster, right? It avoids creating a new array that is then immediately eligible for garbage collection. And yet, turns out, arr2 is faster because hotspot recognizes this pattern and will optimize out the zeroing-out of that array (not a thing you can do in java, but perfectly possible in machine code of course), because it knows that all bytes are overwritten regardless.

Why should I write idiomatic java code?

Keep in mind, hotspot is a system that tries to identify patterns, and applies optimizations to these patterns. There are an infinite amount of patterns one could theoretically optimize. Thus, hotspot code is designed to search for useful patterns: Take a given pattern, and calculate [odds that this appears in your average java project * how often it would show up in performance crucial code paths * amount of performance gain we can realize for it]. You should take away from this that you should write idiomatic java code. If you write bizarro java code nobody else writes, hotspot is vastly more likely to fail to optimize it, because the authors of hotspot tooling are people too and they optimize for common cases, not for weirdness. SOURCE: Douglas Hawkins, JVM performance engineer at Azul for example, this devoxx presentation, and many other JVM performance engineers have said similar things.

In passing, you get code that is easy to maintain, and easy to explain - because other java coders will read it and find familiar ground.

Seriously, become a performance expert, that is the only way?

Mostly. But, hey, CPU and memory is quite cheap, and hotspot rarely makes algorithmic improvements (as in, hotspot will rarely turn an algorithm that is O(n^2) into e.g. an O(n) one, as in: If you graph 'size of input' against 'time taken to run the algorithm', that the algorithm appears to result in a curve that looks like y = x^2, but hotspot manages to turn that into a y = x linear affair. That's rare to impossible - the improvements tend to always be of constant factors, so throwing some more CPU cores and/or RAM at it is just as effective, generally.

Also, of course, algorithmic wins always dwarf whatever hotspot and micro/nano-optimizations could ever do for you.

Thus: Just write code that looks good, is easy to test, written in an idiomatic fashion, and uses the right, most efficient algorithms, and it'll run fast. If it isn't fast enough, throw more CPU or RAM at it. If it isn't fast enough, spend 10 years becoming an expert.

"Lets add an empty check, yaknow, just in case!" doesn't fit in that plan.