Spark can I manually specify the number of partitions when do textFile

169 views Asked by At

The spark will automatically decide the number of partitions base on the size of the input file. I have two questions:

Can I specify the number of the partition rather than let the spark decide how much partitions?

How bad is shuffle when doing the repartition? Is it really expensive for the performance? My case is that I need repartition to "1" to write into the one Parquet file, the partition was "31". How bad is it? why?

2

There are 2 answers

0
Prashant On

Repartition and coalesce are the two functions that are used for repartitioning of data once it is read.

0
Brian Z On

Not able to do this, the number of partition determined by the file size.