Cloud Composer 2: prevent eviction of worker pods

599 views Asked by At

I am currently planning to upgrade our Cloud Composer environment from Composer 1 to 2. However I am quite concerned about disruptions that could occur in Cloud Composer 2 due to the new autoscaling behavior inherited from GKE Autopilot. In particular since nodes will now auto-scale based on demand, it seems like nodes with running workers could be killed off if GKE thinks the workers could be rescheduled elsewhere. This would be bad because my code isn't currently very tolerant to retries.

I think that this can be prevented by adding the following annotation to the worker pods: "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"

However, I don't know how to add annotations to worker pods created by Composer (I'm not creating them myself, after all). How can I do that?

EDIT: I think this issue is made more complex by the fact that it should still be possible for the cluster to evict a pod once it's finished processing all its Airflow tasks. If the annotation is added but doesn't go away once the pod is finished processing, I'm worried that could prevent the cluster from ever scaling down.

So a more dynamic solution may be needed, perhaps one that takes into account the actual tasks that Airflow is processing.

1

There are 1 answers

4
Muhammad Asadullah On

If I have understood your problem well. Could you please try this solution:

  1. In the Cloud Composer environment, navigate to the Kubernetes Engine --> Workloads page in the GCP Console.
  2. Find the worker pod you want to add the annotation to and click on the name of the pod.
  3. On the pod details page, click on the Edit button.
  4. In the Pod template section, find the Annotations field and click on the pencil icon to edit.
  5. In the Edit annotations field, add the annotation "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
  6. Click on the Save button to apply the change.

Let me know if it works fine. Good luck.