GKE: Service account for Config Connector lacks permissions

991 views Asked by At

I'm attempting to get Config Connector up and running on my GKE project and am following this getting started guide.

So far I have enabled the appropriate APIs:

> gcloud services enable cloudresourcemanager.googleapis.com

Created my service account and added policy binding:

> gcloud iam service-accounts create cnrm-system
> gcloud iam service-accounts add-iam-policy-binding [email protected] --member="serviceAccount:test-connector.svc.id.goog[cnrm-system/cnrm-controller-manager]" --role="roles/iam.workloadIdentityUser"
> kubectl wait -n cnrm-system --for=condition=Ready pod --all

Annotated my namespace:

> kubectl annotate namespace default cnrm.cloud.google.com/project-id=test-connector

And then run through trying to apply the Spanner yaml in the example:

~ >>> kubectl describe spannerinstance spannerinstance-sample                                                                                                                                                                                                                            
Name:         spannerinstance-sample
Namespace:    default
Labels:       label-one=value-one
Annotations:  cnrm.cloud.google.com/management-conflict-prevention-policy: resource
              cnrm.cloud.google.com/project-id: test-connector
API Version:  spanner.cnrm.cloud.google.com/v1beta1
Kind:         SpannerInstance
Metadata:
  Creation Timestamp:  2020-09-18T18:44:41Z
  Generation:          2
  Resource Version:    5805305
  Self Link:           /apis/spanner.cnrm.cloud.google.com/v1beta1/namespaces/default/spannerinstances/spannerinstance-sample
  UID:                 
Spec:
  Config:        northamerica-northeast1-a
  Display Name:  Spanner Instance Sample
  Num Nodes:     1
Status:
  Conditions:
    Last Transition Time:  2020-09-18T18:44:41Z
    Message:               Update call failed: error fetching live state: error reading underlying resource: Error when reading or editing SpannerInstance "test-connector/spannerinstance-sample": googleapi: Error 403: Request had insufficient authentication scopes.
    Reason:                UpdateFailed
    Status:                False
    Type:                  Ready
Events:
  Type     Reason        Age                      From                        Message
  ----     ------        ----                     ----                        -------
  Warning  UpdateFailed  6m41s        spannerinstance-controller  Update call failed: error fetching live state: error reading underlying resource: Error when reading or editing SpannerInstance "test-connector/spannerinstance-sample": googleapi: Error 403: Request had insufficient authentication scopes.

I'm not really sure what's going on here, because my cnrm service account has ownership of the project my cluster is in, and I have the APIs listed in the guide enabled.

The CC pods themselves appear to be healthy:

~ >>> kubectl wait -n cnrm-system --for=condition=Ready pod --all                                                                                                                                                                                                                    
pod/cnrm-controller-manager-0 condition met
pod/cnrm-deletiondefender-0 condition met
pod/cnrm-resource-stats-recorder-58cb6c9fc-lf9nt condition met
pod/cnrm-webhook-manager-7658bbb9-kxp4g condition met

Any insight in to this would be greatly appreciated!

2

There are 2 answers

1
Mr.KoopaKiller On BEST ANSWER

By the error message you have posted, I should supposed that it might be an error in your GKE scopes.

To GKE access others GCP APIs you must allow this access when creating the cluster. You can check the enabled scopes with the command:

gcloud container clusters describe <cluster-name> and find in the result for oauthScopes.

Here you can see the scope's name for Cloud Spanner, you must enable the scope https://www.googleapis.com/auth/cloud-platform as minimum permission.

To verify in the GUI, you can see the permission in: Kubernetes Engine > <Cluster-name> > expand the section permissions and find for Cloud Platform

0
Jty.tan On

I was having that same error message, except it was because the node and the node pool it belonged to did not have the GKE Metadata Server enabled.

To check, using kubectl, find out which nodes the pods are on

kubectl get pods -n cnrm-system -o wide

go to the cluster in the GCP UI (easier this way), then to the nodes tab, then to the node pool that each of the cnrm-system pods are running on. Check under Security to see if they have the GKE Metadata Server enabled or not. (3rd from the bottom in this picture)

enter image description here

What I'd found - the first few times - was that I always had 1 or 2 node pools that did not have the metadata server enabled, and whenever one of the config-connector pods ended up on that node, it'd give me that exact error.

The note in the doco about needing to enable the metadata server (or workload identity) in each node pool is hidden away in a tiny line.

It'd be nice if GCP made it possible for their pods to be steered towards a node that had a label so that we could direct it to a specific node pool that we'd created with workload identity on...

Hope that helps!