Scala - sort RDD partitions

460 views Asked by At

Assume I have RDD of Integer from 1 to 1,000,000,000 and I want to print them ordered using foreachPartition. There might be situation that the partition of 5-6-7-8 will be printed before 1-2-3-4. How can I prevent this?

Thanks, Maya

1

There are 1 answers

0
Patrick McGloin On BEST ANSWER

I think the only way to do this would be to ensure there is only one partition, and then you could print your data. You can call repartition(1) or coalesce(1) on your RDD to reduce the number of partitions. For your use case I think coalesce is better as it avoids a shuffle.

https://spark.apache.org/docs/1.3.1/programming-guide.html#transformations