Spark - Only driver run tasks

134 views Asked by At

I'm trying to run GroupByTest using Spark standalone mode. I've run it successfully on one machine, when I try to run it on a different machine it works but it's seems like Driver is the only instance that perform tasks.

I know it because I've added log messages to the shuffle manager constructor and I can see that Driver is the only instance calling this ctr, but on the first machine I can see two instances calling this ctr, Driver and a worker/executor.

What is the reason for that? In both machines I've run it with the same configuration: 1 node, 1 executor, same amount of memory, same job size.

2

There are 2 answers

1
Sourav Khoso On

Thanks for asking the question!!

I suspect you are using spark master local in this case. You can switch to master yarn, kubernetes or mesos based on your distributed environment setup.

0
Brave On

I've identified the issue, and it turns out the problem lies with my own implementation. It appears that the Driver utilizes a worker to execute tasks, but the worker fails when attempting to initialize my Shuffle Manager.

My mistake was that I've looked in the wrong place for the logs. Apparently Spark write logs of the Worker/Driver to two locations. By default this location are $SPARK_HOME/logs and $SPARK_HOME/work.

I mistakenly checked only the first location, but the actual logs for the Worker's execution were in the latter.