Spark can I manually specify the number of partitions when do textFile

Question

Spark can I manually specify the number of partitions when do textFile

171 views Asked by Brian Z At 19 November 2018 at 05:19

The spark will automatically decide the number of partitions base on the size of the input file. I have two questions:

Can I specify the number of the partition rather than let the spark decide how much partitions?

How bad is shuffle when doing the repartition? Is it really expensive for the performance? My case is that I need repartition to "1" to write into the one Parquet file, the partition was "31". How bad is it? why?

Original Q&A

There are 2 answers

**Prashant** · Answer 1 · 2018-11-19T07:40:15+00:00

Prashant On 19 November 2018 at 07:40

Repartition and coalesce are the two functions that are used for repartitioning of data once it is read.

**Brian Z** · Answer 2 · 2019-03-27T22:09:28+00:00

Brian Z On 27 March 2019 at 22:09

Not able to do this, the number of partition determined by the file size.

TechQA.

Spark can I manually specify the number of partitions when do textFile

There are 2 answers

Related Questions in APACHE-SPARK

Related Questions in TEXT-FILES

Related Questions in HIVE-PARTITIONS

Popular Questions

Popular Tags

Trending Questions