Terraform GKE problem pulling from private gcr

1.3k views Asked by At

As I did not get anywhere with a standard GKE cluster via Terraform (see GKE permission issue on gcr.io with service account based on terraform), I have now created one with a separate node pool. However, I still cannot get a basic container pulled from an eu.gcr.io private repo.

My terraform yml is as follows.

    resource "google_container_cluster" "primary" {
      name     = "gke-cluster"
      location = "${var.region}-a"

      node_locations = [
        "${var.region}-b",
        "${var.region}-c",
      ]

      network     = var.vpc_name
      subnetwork  = var.subnet_name

      remove_default_node_pool = true
      initial_node_count       = 1
      # minimum kubernetes version for master
      min_master_version = var.min_master_version

      master_auth {
        username = var.gke_master_user
        password = var.gke_master_pass
      }

    }

resource "google_container_node_pool" "primary_preemptible_nodes" {
  name     = "gke-node-pool"
  location = "${var.region}-a"

  cluster     = google_container_cluster.primary.name
  version     = var.node_version
  node_count  = 3

  node_config {
    preemptible  = true

    metadata = {
      disable-legacy-endpoints = "true"
    }

    # based on project number
    service_account = "[email protected]"

    oauth_scopes = [
      "https://www.googleapis.com/auth/compute",
      "https://www.googleapis.com/auth/devstorage.read_only"
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
    ]
  }
}

all creates very nicely. Then I want to deploy on the cluster with

I create these deployments with the following yml file (deployment.yml)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      component: api
  template:
    metadata:
      labels:
        component: api
    spec:
      containers:
      - name: api
        image: eu.gcr.io/project-dev/api:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 5060

and it continues to give:

Failed to pull image "eu.gcr.io/project-dev/api:latest": rpc error: code = 
Unknown desc = Error response from daemon: pull access denied for eu.gcr.io/project-dev/api, 
repository does not exist or may require 'docker login': denied: Permission denied for 
"latest" from request "/v2/project-dev/lcm_api/manifests/latest".

Warning Failed 94s (x2 over 111s) kubelet, gke-cluster-dev-node-pool-90efd247-7vl4 Error: ErrImagePull

I have open cloud shell in kubernetes cluster and

docker pull eu.gcr.io/project-dev/api:latest 

works just fine.

I am seriously running out of ideas here (and consider moving back to AWS). Could it have something to do with the permissions the container is pushed to eu.gcr.io?

I use:

docker login -u _json_key --password-stdin https://eu.gcr.io < /home/jeroen/.config/gcloud/tf_admin.json

locally where tf_admin.json is the service account of my administration project that created the infrastructure project. I then push

docker push eu.gcr.io/project-dev/api:latest   

Another idea. From the documentation and other stackoverflow questions (see e.g. GKE - ErrImagePull pulling from Google Container Registry) it seems key to have the correct service account and oauth-scopes. How can I check that it is using the right service-account when pulling? And whether the scopes are correctly assigned?

2

There are 2 answers

1
Alex Vorona On

Seems official terraform example with OAuth scopes is outdated and shouldn't be used. My fix is to grant all permissions via OAuth scopes and use IAM roles to manage it instead:

    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform",
    ]

You can check similar issue also.

0
leberknecht On

Maybe someone finds this useful: in my case, the SA for the cluster was missing the roles/storage.objectViewer role.