Does enabling "workload identity" in an existing gke and nodepool has any downtime?

1.1k views Asked by At

I don't seem to find any documentation mentioning if there would be any downtime when a gke cluster is edited to activate workload identity.

I would like to know if there is any downtime

  1. while enabling it in an existing cluster
  2. while enabling it in an existing node pool

Tried reaching out to gcp team through feedback link, but they suggested to reach to stackexchange

2

There are 2 answers

0
Baskar Lingam Ramachandran On BEST ANSWER

We went ahead and tried this out

  • Enabling workload-identity at the cluster level has downtime to the control plane (no editing of the cluster possible; but existing workloads are unaffected)

  • Enabling workload-identity at the node-pool level recreates nodes (gke automatically cordons and recreates nodes

0
Wojtek_B On

If you do everything "by the book" then just enabling Workload Identity there's no downtime. However you have to consider following:

Workload Identity allows workloads in your GKE clusters to impersonate Identity and Access Management (IAM) service accounts to access Google Cloud services.

When you enable this feature on a running cluster nothing will actually happen. Only when you add a new node-pool to this cluster it will start using this type of authentication. Nodes in the existing pools will stay unaffected.

So - by just enabling the feature - there will be no downtime.

However,

After you enable Workload Identity on an existing cluster, you might want to migrate your running workloads to use Workload Identity. Select the migration strategy that is ideal for your environment. You can create new node pools with Workload Identity enabled, or update existing node pools to enable Workload Identity.

Otherwise you may expect some downtime - that's the first exception.

There's one more way to "cause downtime". After you enable the feature on the cluster - you can force it to be also enabled for existing node-pools. Then you can expect some downtime since you should have configured you app to use it and then migrate it to new pool:

Modifying the node pool immediately enables Workload Identity for any workloads running in the node pool. This prevents the workloads from using the Compute Engine default service account and might result in disruptions.