Sometimes failing to retrieve Application Default Credentials on GKE Autopilot in googleapis auth library

902 views Asked by At

Some pods in my GKE Autopilot cluster aren't able to grab the Application Default Credentials to call other GCP services.

I will apply a new deployment, and 1 or 2 out of the 3 pods won't be able to authenticate using the googleapis (google-auth-library) npm package (tried with version v73.0.0 and the latest v84.0.0).

I get:

Error: Could not load the default credentials. Browse to https://cloud.google.com/docs/authentication/getting-started for more information. at GoogleAuth.getApplicationDefaultAsync (/node_modules/google-auth-library/build/src/auth/googleauth.js:173:19)

I am using this code and retrying on failure:

       const {google} = require('googleapis');

       const setGoogleAuth = async () => {
            try {
                const auth = new google.auth.GoogleAuth({
                    // Scopes can be specified either as an array or as a single, space-delimited string.
                    scopes: ['https://www.googleapis.com/auth/cloud-platform'],
                });             
                
                // Acquire an auth client, and bind it to all future calls
                const authClient = await auth.getClient();
                google.options({auth: authClient});
            } catch (e) {
                console.error(e)
                
                //retry

                //sleep for 3 seconds
                await sleep(3000)
                

                await setGoogleAuth()
            }
            
        }

Calling the metadata server manually via curl http://metadata/computeMetadata/v1/instance/service-accounts/default/identity?audience=<my-gcp-endpoint> returns a valid token from the pod failing to authenticate with the googleapis package

Sometimes killing the pod and having them recreated works (using Horizontal Pod Autoscaler). Other times, I have no problems with the deployment. At times, killing the pods so they recreate doesn't help at all. The behaviour seems very non-deterministic.

Any help would be appreciated, thank you!

1

There are 1 answers

1
Mike Gindin On

Setting DETECT_GCP_RETRIES=3 or K_SERVICE=true in the environment worked.

See full GitHub issue discussion here: https://github.com/googleapis/google-auth-library-nodejs/issues/1236