Why does skip() in this parallel stream expression in Java 19 cause an OOM even with 8GB?

84 views Asked by At

I get an OOM in Java 19 even with 8GB if I do this:

IntStream
.iterate(0, i -> i + 1)
.skip(2)
.limit(10_000_000)
.filter(i -> checkSum(i) <= 20)
.parallel()
.count();

However, I don't get any OOM if I omit skip(2):

IntStream
.iterate(0, i -> i + 1)
//.skip(2)
.limit(10_000_000)
.filter(i -> checkSum(i) <= 20)
.parallel()
.count();

where checksum(...) is

public static long checkSum(long n) {
    long result = 0;
    long remaining = n;
    while (0 < remaining) {
        long remainder = remaining % 10;
        result += remainder;
        remaining = (remaining - remainder) / 10;
    }
    return result;
}

Why does skip() in this parallel stream expression in Java 19 cause an OOM even with 8GB?

I know I should use range(...) instead of iterate()+limit() with or without skip(). However, that doesn't answer me this question. I would like to understand what's the issue here.

1

There are 1 answers

0
Alexander Ivanchenko On BEST ANSWER

skip() - is a stateful operation which guarantees that n first elements of the stream (with respect of the encounter order, if the stream is ordered) would be omitted.

It would be cheap in a sequential pipeline, but might be costful while running in parallel if the stream is ordered. Documentation warns about that and suggests loosening the constraint ordering if possible.

API Note:

While skip() is generally a cheap operation on sequential stream pipelines, it can be quite expensive on ordered parallel pipelines, especially for large values of n, since skip(n) is constrained to skip not just any n elements, but the first n elements in the encounter order. Using an unordered stream source (such as generate(Supplier)) or removing the ordering constraint with BaseStream.unordered() may result in significant speedups of skip() in parallel pipelines, if the semantics of your situation permit. If consistency with encounter order is required, and you are experiencing poor performance or memory utilization with skip() in parallel pipelines, switching to sequential execution with BaseStream.sequential() may improve performance.

Emphasis added

The following unordered Stream would run without issues (result of the stream execution might not be consistent because in unordered stream threads a free to discard any n elements, not n first):

IntStream
    .iterate(0, i -> i + 1)
    .unordered()
    .skip(2)
    .limit(10_000_000)
    .filter(i -> checkSum(i) <= 20)
    .parallel()
    .count();