I want to run my spark executors on task nodes only in my AWS EMR cluster and yarn labels are one of the ways to achieve this. I can specify labels during spark-submit. I want to achieve the following
- Add a custom label during the cluster start-up.
- Associate this label to any node joining my cluster during auto-scaling.
I want to do this so that I can reduce the cost of my cluster by ensuring all executors will run on on-spot instances.
We achieved it through the below process.