Spark: Increase the number of tasks/partitions

Question

Spark: Increase the number of tasks/partitions

2.6k views Asked by Forec At 21 August 2017 at 08:03

The number of tasks in Spark is decided by the total number of RDD partitions at the beginning of stages. For example, when a Spark application is reading data from HDFS, the partition method for Hadoop RDD is inherited from FileInputFormat in MapReduce, which is affected by the size of HDFS blocks, the value of mapred.min.split.size and the compression method, etc.

The screenshot of my tasks

The tasks in the screenshot took 7, 7, 4 seconds, and I want to make them balanced. Also, the stage is split into 3 tasks, are there any ways to specify Spark the number of partitions/tasks?

Original Q&A

There are 1 answers

**Robin** · Answer 1 · 2017-08-21T09:37:11+00:00

Robin On 21 August 2017 at 09:37

The task dependents on the partition. You can set the partitioner for the RDD, In the partitioner you can set the number of partitions.

TechQA.

Spark: Increase the number of tasks/partitions

There are 1 answers

Related Questions in HADOOP

Related Questions in APACHE-SPARK

Related Questions in RDD

Popular Questions

Trending Questions