Java8 - Mapping with Streams without Collecting for Performance

4k views Asked by At

Exponentially Growing Stream

I have a Stream that grows exponentially for creating permutations. So each call to addWeeks increases the number of elements in the Stream.

Stream<SeasonBuilder> sbStream = sbSet.stream();

for (int i = 1; i <= someCutOff; i++) {
    sbStream = sbStream.map(sb -> sb.addWeeks(possibleWeeks))
                       .flatMap(Collection::stream); 
}

// Collect SeasonBuilders into a Set
return sbStream.collect(Collectors.toSet());   // size > 750 000 

Problems

  • Each call to addWeeks returns a Set<SeasonBuilder> and collecting everything into a Set takes a while.
  • addWeeks is not static and needs to be called on each SeasonBuilder in the stream, each time through the loop

    public Set<SeasonBuilder> addWeeks(
        final Set<Set<ImmutablePair<Integer, Integer>>> possibleWeeks) {
            return possibleWeeks.stream()
                    .filter(containsMatchup())   // Finds the weeks to add
                    .map(this::addWeek)   // Create new SeasonBuilders with the new week
                    .collect(Collectors.toSet());
    
  • Out of memory error..... when possible weeks has size = 15

Questions

  • Should I be using a method chain other than map followed by flatmap?
  • How can I modify addWeeks so that I don't have to collect everything into a Set?
    • Should I return a Stream<SeasonBuilder>? Can I flatmap a Stream?

Update:

Thanks for the help everyone!

I have put the code for the methods in a gist

Thanks to @Holger and @lexicore for suggesting returning a Stream<SeasonBuilder> in addWeeks. Minor performance increase, as was predicted by @lexicore

I tried using parallelStream() and there was no significant change in performance

Context

I am creating all possible permutations of a Fantasy Football season, which will be used elsewhere for stats analysis. In a 4-team, 14-week season, for any given week, there could be three different possibilities

  • (1 vs 2) , (3 vs 4)
  • (1 vs 3) , (2 vs 4)
  • (1 vs 4) , (2 vs 3)

To solve the problem, plug in the permutations, and we have all our possible seasons. Done! But wait... what if Team 1 only ever plays Team 2. Then the other teams would be sad. So there are some constraints on the permutations.

Every team must play each other roughly the same amount of times (i.e. Team 1 cannot play against Team 3 ten times in a single season). In this example - 4-teams, 14 weeks - each team is capped at playing another team 5 times. So some sort of filtering has to happen when creating permutations, to avoid non-valid seasons.

Where this gets more interesting is:

  • 6 Team League -- 15 possible weeks
  • 8 Team League -- 105 possible weeks
  • 10 Team League -- 945 possible weeks

I am trying to optimize performance where possible, because there are a lot of permutations to create. A 4-team, 14-week season creates 756 756 (=14!/(5!5!4!)) possible seasons, given the constraints. 6-team or 8-team seasons just get crazier.

1

There are 1 answers

1
lexicore On BEST ANSWER

Your whole construction is very suspicious to begin with. If you're interested in performance it is unlikely that generating all permutations explicitly is a good approach.

I also don't believe that collecting to set and streaming again is the performance problem.

But nevertheless, to answer your question: why don't you return Stream<SeasonBuilder> from addWeeks directly, why do you collect it to set first? Return the stream directy, without collecting:

public Stream<SeasonBuilder> addWeeks(
    final Set<Set<ImmutablePair<Integer, Integer>>> possibleWeeks) {
        return possibleWeeks.stream()
                .filter(containsMatchup())   // Finds the weeks to add
                .map(this::addWeek);   // Create new SeasonBuilders with the new week
}

You won't need map/flatMap then, just one flatMap:

sbStream = sbStream.flatMap(sb -> sb.addWeeks(possibleWeeks));

But this won't help your performance much anyway.