Cannot get faster results via yarn when running spark in a hadoop cluster

495 views Asked by At

Applying an LSH algorithm in Spark 1.4 (https://github.com/soundcloud/cosine-lsh-join-spark/tree/master/src/main/scala/com/soundcloud/lsh), I process a text file (4GB) in a LIBSVM format (https://www.csie.ntu.edu.tw/~cjlin/libsvm/) to find duplicates. First, I have run my scala script in a server using only one executor with 36 cores. I retrieved my results in 1,5 hrs.

In order to get my results much faster, I tried to run my code in a hadoop cluster via yarn in an hpc with 3 nodes where each node has 20 cores and 64 gb memory. Since I am not experienced much running codes in hpc, I have followed the suggestions given here: https://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

As a result, I have submitted spark as below:

spark-submit --class com.soundcloud.lsh.MainCerebro --master yarn-cluster --num-executors 11 --executor-memory 19G --executor-cores 5 --driver-memory 2g cosine-lsh_yarn.jar 

As I understood, I have assigned 3 executors per node and 19 gb for each executor.

However, I could not get my results even though more than 2 hours passed.

My spark configuration is:

val conf = new SparkConf()
      .setAppName("LSH-Cosine")
      .setMaster("yarn-cluster")
      .set("spark.driver.maxResultSize", "0");

How can I dig this issue? From where should I start to improve calculation time?

EDIT:

1)

I have noticed that coalesce is way much slower in yarn

  entries.coalesce(1, true).saveAsTextFile(text_string)

2)

EXECUTORS AND STAGES FROM HPC:

enter image description here enter image description here

EXECUTORS AND STAGES FROM SERVER:

enter image description here

enter image description here

1

There are 1 answers

0
loneStar On

More memory is clogged in the storage memory. You are not using that memory efficiently ie (you are caching the data). A total of less than 10 gigs is used of 40 gigs. You are reduce that memorystorge and use that memoryexecution.

Even though you specified 11 executors it started only 4 executors. Inference from first spark UI screenshot. Total cores used by the spark is only 19 across all executors. Total cores equal to number of task running.

Please go through the following link.

https://community.hortonworks.com/articles/42803/spark-on-yarn-executor-resource-allocation-optimiz.html