I’m trying to set up a role for our CI/CD server that can support rolling back a failed deployment. The current permissions are working for updating a deployment and related resources and monitoring their status, but when I attempt to run for example “kubectl rollout undo deployment/admin” as the CI user I get the error:
error: failed to retrieve replica sets from deployment admin: replicasets.apps is forbidden: User "ci-admin" cannot list resource "replicasets" in API group "apps" in the namespace "acceptance"
This was the original role configuration:
# Server role that allows CI to push application deployments to Kubernetes
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: deployment-role
namespace: acceptance
rules:
- apiGroups: ["*"]
resources: ["deployments"]
resourceNames: ["admin", "backend", "web"]
verbs: ["patch", "update", "watch"]
- apiGroups: ["*"]
resources: ["deployments"]
verbs: ["get", "list"]
- apiGroups: ["*"]
resources: ["configmaps"]
resourceNames: ["admin-fluent-bit-config", "backend-fluent-bit-config", "web-fluent-bit-config"]
verbs: ["patch", "update", "watch"]
- apiGroups: ["*"]
resources: ["configmaps"]
verbs: ["get", "list"]
- apiGroups: ["*"]
resources: ["horizontalpodautoscalers"]
resourceNames: ["backend"]
verbs: ["delete", "patch", "update"]
- apiGroups: ["*"]
resources: ["horizontalpodautoscalers"]
verbs: ["create", "get", "list"]
- apiGroups: ["*"]
resources: ["events", "pods", "pods/log"]
verbs: ["get", "list"]
I attempted to add permission to get and list replicasets to address the error:
- apiGroups: ["*"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list"]
but I’m still getting the same error as before.
The Kubernetes documentation isn’t helpful at all, as there doesn’t appear to be any comprehensive list of what permissions are needed for any given command. There’s only a handful of examples.
Can anyone tell what permissions are needed for a rollback?
I tried this again this morning, and it just worked, as originally written.
My best guess as to what was going on is that CI was actually running on a different cluster at the time. Our client had moved the acceptance environment to a new AWS account in September, but the old environment wasn’t fully cleaned out until mid-October. While I was making changes to the role in the new cluster from my dev box, the CI server must have still been using the context of the old cluster, and I didn’t notice it because they both had the same name. :/
Sorry for the false alarm.