Multithreading in Spark: how to modify a shared RDD?

38 views Asked by Sean At 04 December 2024 at 03:26

For my task, I have an input RDD, and I'm evaluating the scores given by 10-100 models and sum the scores. To parallelize, I'm using multithreading, with each thread evaluating 1 model and producing 1 score RDD. However, this approach requires additional memory for all the score RDDs, as well as addition runtime to sum all the RDDs at the end.

Is there a way to let each thread modify one shared RDD within Spark? I considered accumulators, but it seems that I need a large amount (1 per input). The naive way of using a global variable RDD also does not work.

Original Q&A

TechQA.

Multithreading in Spark: how to modify a shared RDD?

There are 0 answers

Related Questions in PYTHON

Related Questions in MULTITHREADING

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Popular Questions

Popular Tags

Trending Questions