I am currently setting up ArgoCD in a hub model to manage deployments across multiple AKS clusters. I've chosen to use Managed Identities for authentication as outlined here: https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#aks
Despite successfully configuring the necessary roles, permissions, and annotations, and verifying that the argocd-k8s-auth command manually retrieves tokens within the ArgoCD pod, the deployment operation fails with an error: getting credentials: exec: executable argocd-k8s-auth failed with exit code 20.
The relevant portion of the error log is as follows:
ComparisonError: Failed to load live state: failed to get cluster info for "https://xxxx-dn4ng0go.hcp.westeurope.azmk8s.io:443": error synchronizing cache state: Get "https://xxxx-dn4ng0go.hcp.northeurope.azmk8s.io:443/version?timeout=32s": getting credentials: exec: executable argocd-k8s-auth failed with exit code 20
Steps Taken
- Assigned Azure Kubernetes Service RBAC Cluster Admin role to the ArgoCDManagedIdentity.
- Annotated the argocd service account with the Managed Identity's client ID and tenant ID.
- Confirmed environment variables for Azure Managed Identity are correctly injected into the ArgoCD pods.
- Manually tested argocd-k8s-auth within the pod, successfully retrieving a token.
In fact, if I opted to use SPN as outlined in the aforementioned ArgoCD documentation link - the cluster registration is successful as well as deployment. Here are the configuration of those secrets, one uses SPN and another uses workload identity.
apiVersion: v1
kind: Secret
metadata:
name: success-cluster
namespace: argocd
labels:
argocd.argoproj.io/secret-type: cluster
type: Opaque
stringData:
name: success-cluster
server: <URL>
config: |
{
"execProviderConfig": {
"command": "argocd-k8s-auth",
"env": {
"AAD_ENVIRONMENT_NAME": "AzurePublicCloud",
"AAD_SERVICE_PRINCIPAL_CLIENT_SECRET": "<SECRET>",
"AZURE_TENANT_ID": "<TENANT_ID>",
"AAD_SERVICE_PRINCIPAL_CLIENT_ID": "<CLIENT_ID>",
"AAD_LOGIN_METHOD": "spn"
},
"args": ["azure"],
"apiVersion": "client.authentication.k8s.io/v1beta1"
},
"tlsClientConfig": {
"insecure": false,
"caData": "<CA_DATA>"
}
}
---
apiVersion: v1
kind: Secret
metadata:
name: fail-cluster
namespace: argocd
labels:
argocd.argoproj.io/secret-type: cluster
type: Opaque
stringData:
name: fail-cluster
server: <URL>
config: |
{
"execProviderConfig": {
"command": "argocd-k8s-auth",
"env": {
"AAD_ENVIRONMENT_NAME": "AzurePublicCloud",
"AZURE_CLIENT_ID": "<MID_CLIENT_ID>",
"AZURE_TENANT_ID": "<TENANT_ID>",
"AZURE_FEDERATED_TOKEN_FILE": "/var/run/secrets/azure/tokens/azure-identity-token",
"AZURE_AUTHORITY_HOST": "https://login.microsoftonline.com/",
"AAD_LOGIN_METHOD": "workloadidentity"
},
"args": ["azure"],
"apiVersion": "client.authentication.k8s.io/v1beta1"
},
"tlsClientConfig": {
"insecure": false,
"caData": "<CA_DATA>"
}
}
Both SPN, and the managed identity are assigned the same permissions on the target cluster, for extra measures I elevated the mid's permissions to 'Owner', and validated the federation created successfully between the argocd-server service account and the identity.
What am I missing?