How to add a custom node label to task node in EMR

653 views Asked by At

I want to run my spark executors on task nodes only in my AWS EMR cluster and yarn labels are one of the ways to achieve this. I can specify labels during spark-submit. I want to achieve the following

  1. Add a custom label during the cluster start-up.
  2. Associate this label to any node joining my cluster during auto-scaling.

I want to do this so that I can reduce the cost of my cluster by ensuring all executors will run on on-spot instances.

1

There are 1 answers

0
Rahul Garg On

We achieved it through the below process.

  1. During the Maser node booting, we run our custom script where we create a new TASK label. EMR creates the Core level automatically.
  2. During Core and Task node booting, we identify what is node type from metadata API and attach the appropriate label to the machine depending on the instance type. If it is the on-demand instance, we attach CORE else we add the TASK label to the node.
  3. When we submit our spark job, we mention to executor node label expression as TASK, which ensures to all executors on TASK node only.