Spark Program running very slow on cluster

1.2k views Asked by At

I am trying to run my PySpark in Cluster with 2 nodes and 1 master (all have 16 Gb RAM). I have run my spark with below command.

spark-submit --master yarn --deploy-mode cluster --name "Pyspark" --num-executors 40 --executor-memory 2g CD.py

However my code runs very slow, it takes almost 1 hour to parse 8.2 GB of data. Then i tried to change the configuration in my YARN. I changed following properties.

yarn.scheduler.increment-allocation-mb = 2 GiB

yarn.scheduler.minimum-allocation-mb = 2 GiB

yarn.scheduler.increment-allocation-mb = 2 GiB

yarn.scheduler.maximum-allocation-mb = 2 GiB

After doing these changes still my spark is running very slow and taking more than 1 hour to parse 8.2 GB of files.

1

There are 1 answers

0
args On

could you please try with the below configuration

spark.executor.memory 5g

spark.executor.cores 5

spark.executor.instances 3

spark.driver.cores 2