Spark Program running very slow on cluster

Question

Spark Program running very slow on cluster

1.3k views Asked by Ironman At 13 September 2017 at 08:14

I am trying to run my PySpark in Cluster with 2 nodes and 1 master (all have 16 Gb RAM). I have run my spark with below command.

spark-submit --master yarn --deploy-mode cluster --name "Pyspark" --num-executors 40 --executor-memory 2g CD.py

However my code runs very slow, it takes almost 1 hour to parse 8.2 GB of data. Then i tried to change the configuration in my YARN. I changed following properties.

yarn.scheduler.increment-allocation-mb = 2 GiB

yarn.scheduler.minimum-allocation-mb = 2 GiB

yarn.scheduler.increment-allocation-mb = 2 GiB

yarn.scheduler.maximum-allocation-mb = 2 GiB

After doing these changes still my spark is running very slow and taking more than 1 hour to parse 8.2 GB of files.

Original Q&A

There are 1 answers

**args** · Answer 1 · 2017-09-14T20:27:27+00:00

args On 14 September 2017 at 20:27

could you please try with the below configuration

spark.executor.memory 5g

spark.executor.cores 5

spark.executor.instances 3

spark.driver.cores 2

TechQA.

Spark Program running very slow on cluster

There are 1 answers

Related Questions in HADOOP

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in HADOOP-YARN

Related Questions in SPARK-SUBMIT

Popular Questions

Popular Tags

Trending Questions