Why is a standard iterator faster than sequence in Kotlin?

1.2k views Asked by At

I am new to sequences, so I may have done something (more or less) terribly wrong, but I have a questiion:

I have written two functions:

fun isPrimeNumber1(number: Int): Boolean {
    if (number <= 1) return false
    for (divider in 2 .. number / 2) {
        if ( number % divider == 0 ) return false
    }
    return true
}

and

fun isPrimeNumber2(number: Int): Boolean {
    if (number <= 1) return false
    return !(2 .. number / 2).asSequence().map { it }.any { number % it == 0 }
}

Now I am running a test written in kotest, in which both functions recieve Int.MAX_VALUE as number.

class MyTestPrime : FunSpec({
    context("Prime numbers") {
        test("Should return true if number is prime (1st fun)") {
            isPrimeNumber1(Int.MAX_VALUE) shouldBe true
        }
        test("Should return true if number is prime (2nd fun)") {
            isPrimeNumber2(Int.MAX_VALUE) shouldBe true
        }
    }
})

Execution time of the isPrimeNumber1() function is around 3,5 seconds, and for the 2nd function isPrimeNumber2() - around 8,5 seconds.

Why is that so? Am I missing something about sequences? Or maybe my code does its purpose in a correct but very unoptimal way?

2

There are 2 answers

1
Utku Özdemir On BEST ANSWER

It is expected. The variant with sequence creates an Iterator object, and calls .hasNext() and .next() functions for each element.

Since an Iterator works with objects and not primitives, all your ints are boxed via Integer::valueOf calls. (Note: .map { it } step is redundant).

I ran both functions via Java Flight Recorder in IntelliJ Idea, and we can see that the sequence variant causes many more function calls compared to the other variant.

isPrimeNumber1:

isPrimeNumber1

isPrimeNumber2:

isPrimeNumber2

As you can see, the isPrimeNumber2 variant causes many more functions to be called under the hood, and therefore is impacted by their overhead.

Another way of inspecting it is to decompile both functions' bytecode to Java. It can give you better insight on what is going on under the hood. Here's both of the functions decompiled (again using IntelliJ):

private static final boolean isPrimeNumber1(int number) {
  if (number <= 1) {
    return false;
  } else {
    int divider = 2;
    int var2 = number / 2;
    if (divider <= var2) {
      while (true) {
        if (number % divider == 0) {
          return false;
        }

        if (divider == var2) {
          break;
        }

        ++divider;
      }
    }

    return true;
  }
}

private static final boolean isPrimeNumber2(int number) {
  if (number <= 1) {
    return false;
  } else {
    byte var1 = 2;
    Sequence $this$any$iv =
        SequencesKt.map(
            CollectionsKt.asSequence((Iterable) (new IntRange(var1, number / 2))),
            (Function1) null.INSTANCE);
    int $i$f$any = false;
    Iterator var3 = $this$any$iv.iterator();

    boolean var10000;
    while (true) {
      if (var3.hasNext()) {
        Object element$iv = var3.next();
        int it = ((Number) element$iv).intValue();
        int var6 = false;
        if (number % it != 0) {
          continue;
        }

        var10000 = true;
        break;
      }

      var10000 = false;
      break;
    }

    return !var10000;
  } 
}

Final Notes: As others have mentioned, to get meaningful performance measurements, you need to use a tool like jmh. However, as a rule of thumb, simpler language constructs, such as a regular for/while loop over a sequence, tend to have less overhead due to the lower level of abstraction they offer.

0
cactustictacs On

Utku's answer covers the why (yeah, the sequence code is much less optimal here) but generally speaking - Sequences are meant to be a complement to Iterables. They use the same functions, and you can chain them into processing pipelines.

The difference is that sequences are executed lazily, each item passing through the full pipeline before the next is handled, and items are only processed when they need to be.

For example, check this extremely good and plausible code:

import kotlin.system.measureTimeMillis

fun isMagicNumber(num: Double) = num == 10000.0

fun main(args: Array<String>) {
    val numberStrings = (1..25000).map(Int::toString)

    val itertime = measureTimeMillis {
        numberStrings
            .map(String::toInt).map { it * 2.0 }.first(::isMagicNumber)
    }
    println("Iterable version: $itertime ms")
    
    val seqtime = measureTimeMillis {
        numberStrings.asSequence()
            .map(String::toInt).map { it * 2.0 }.first(::isMagicNumber)
    }
    print("Sequence version: $seqtime ms")
}

Starting with a list of 25,000 numbers, as Strings, the pipeline is

  • map to an Int
  • double it (converting to a Double)
  • get the first one that meets the condition (equals 10,000, in this case)

The Iterable version does each step for the entire list, before moving onto the next step:

List<String> -> List<Int> -> List<Double> -> find element

It creates a new list each time (taking up memory), and it doesn't even start checking elements until it's made three lists and can start going over the last one - that's the earliest it can return a result.

Sequences do this:

String -> Int -> Double -> check

for each item. As soon as it hits the item that meets the check condition, it's done. That's gotta be way faster right!!!

Iterable version: 41 ms
Sequence version: 111 ms

Ah! Well. Nevertheless,

Turns out sequences have overhead (which is why a really basic for loop will obliterate them if you can write one), and also computers are pretty darn good at iterating over things - there are lots of optimisations under the hood, creating new arrays and iterating over them can be faster than using linked lists, etc. Which is why you really need to profile what you're doing, if efficiency is what you're going for.


What if the source list (and therefore all the additional lists the Iterable version creates) was 10x bigger, 250,000 items?

Iterable version: 260 ms
Sequence version: 155 ms

Oh hello, now we're getting somewhere. Turns out all that overhead starts to pile up after a while, and being able to exit early becomes important, and sequences start to get more efficient.

And this is just measuring time - you need to look at memory use too. Building huge lists can quickly use a lot of memory, or even become unrunnable (if you're doing Project Euler-levels of "make this work for a bajiliion kazillion items"). Sequences and their one-thing-at-a-time approach can keep the whole thing workable, and completable before the heat death of the universe

Also sequences can be infinite! Which limits where you can use them, or how efficient some operations are (last() needs to run through the entire sequence), but it can also be used in situations where a fixed-size collection won't work.


So yeah, use the right tool for the job, and make sure that if efficiency is important, you're actually using the version that gives the best results. But sometimes readability and composability are more important, and being able to chain operations to do a thing is better than a mean and lean for loop