Some pods in my GKE Autopilot cluster aren't able to grab the Application Default Credentials to call other GCP services.
I will apply a new deployment, and 1 or 2 out of the 3 pods won't be able to authenticate using the googleapis (google-auth-library) npm package (tried with version v73.0.0 and the latest v84.0.0).
I get:
Error: Could not load the default credentials. Browse to https://cloud.google.com/docs/authentication/getting-started for more information. at GoogleAuth.getApplicationDefaultAsync (/node_modules/google-auth-library/build/src/auth/googleauth.js:173:19)
I am using this code and retrying on failure:
const {google} = require('googleapis');
const setGoogleAuth = async () => {
try {
const auth = new google.auth.GoogleAuth({
// Scopes can be specified either as an array or as a single, space-delimited string.
scopes: ['https://www.googleapis.com/auth/cloud-platform'],
});
// Acquire an auth client, and bind it to all future calls
const authClient = await auth.getClient();
google.options({auth: authClient});
} catch (e) {
console.error(e)
//retry
//sleep for 3 seconds
await sleep(3000)
await setGoogleAuth()
}
}
Calling the metadata server manually via curl http://metadata/computeMetadata/v1/instance/service-accounts/default/identity?audience=<my-gcp-endpoint>
returns a valid token from the pod failing to authenticate with the googleapis package
Sometimes killing the pod and having them recreated works (using Horizontal Pod Autoscaler). Other times, I have no problems with the deployment. At times, killing the pods so they recreate doesn't help at all. The behaviour seems very non-deterministic.
Any help would be appreciated, thank you!
Setting DETECT_GCP_RETRIES=3 or K_SERVICE=true in the environment worked.
See full GitHub issue discussion here: https://github.com/googleapis/google-auth-library-nodejs/issues/1236