I'm trying to run GroupByTest using Spark standalone mode. I've run it successfully on one machine, when I try to run it on a different machine it works but it's seems like Driver is the only instance that perform tasks.
I know it because I've added log messages to the shuffle manager constructor and I can see that Driver is the only instance calling this ctr, but on the first machine I can see two instances calling this ctr, Driver and a worker/executor.
What is the reason for that? In both machines I've run it with the same configuration: 1 node, 1 executor, same amount of memory, same job size.
Thanks for asking the question!!
I suspect you are using spark master
localin this case. You can switch to masteryarn,kubernetesormesosbased on your distributed environment setup.