For my task, I have an input RDD, and I'm evaluating the scores given by 10-100 models and sum the scores. To parallelize, I'm using multithreading, with each thread evaluating 1 model and producing 1 score RDD. However, this approach requires additional memory for all the score RDDs, as well as addition runtime to sum all the RDDs at the end.
Is there a way to let each thread modify one shared RDD within Spark? I considered accumulators, but it seems that I need a large amount (1 per input). The naive way of using a global variable RDD also does not work.