I have deployed openshift(okd) 3.11 using : https://github.com/openshift/openshift-ansible/tree/release-3.11 I would want to produce a scenario where certificates expire and test how the renewal certificates can be done.
Hence I have set following variables in the inventory as 1 day(so that certificates expire quickly):
openshift_hosted_registry_cert_expire_days=1
openshift_ca_cert_expire_days=1
openshift_master_cert_expire_days=1
etcd_ca_default_days=1
As expected after 1 day the oc commands where not working and master-api, master-etcd pods where in exited state. Now i wanted to renew all the certificates hence i have run the re-deploy certificate play referring to https://docs.openshift.com/container-platform/3.11/install_config/redeploying_certificates.html#redeploying-all-certificates-current-ca
ansible-playbook -i openshift-ansible/playbooks/inventory.ini openshift-ansible/playbooks/redeploy-certificates.yml
But the this ansible play gets aborted with error:
.
.
.
.
TASK [Wait for master to restart] **********************************************************************************************************
skipping: [master.167.254.204.228.nip.io]
TASK [Wait for master API to come back online] *********************************************************************************************
skipping: [master.167.254.204.228.nip.io]
TASK [openshift_control_plane : restart master] ********************************************************************************************
changed: [master.167.254.204.228.nip.io] => (item=api)
changed: [master.167.254.204.228.nip.io] => (item=controllers)
RUNNING HANDLER [openshift_control_plane : verify API server] ******************************************************************************
FAILED - RETRYING: verify API server (120 retries left).
FAILED - RETRYING: verify API server (119 retries left).
.
.
.
FAILED - RETRYING: verify API server (2 retries left).
FAILED - RETRYING: verify API server (1 retries left).
fatal: [master.167.254.204.228.nip.io]: FAILED! => {
"attempts": 120,
"changed": false,
"cmd": [
"curl",
"--silent",
"--tlsv1.2",
"--max-time",
"2",
"--cacert",
"/etc/origin/master/ca-bundle.crt",
"https://master.167.254.204.228.nip.io:8443/healthz/ready"
],
"delta": "0:00:00.012426",
"end": "2020-11-29 22:56:24.445762",
"rc": 7,
"start": "2020-11-29 22:56:24.433336"
}
MSG:
non-zero return code
RUNNING HANDLER [openshift_control_plane : verify Local API server] ************************************************************************
Please let me know if im missing out anything while re-deploying certificates or any alternate way where we can renew these certificates.
Update
I have also tried the redeploy-openshift-ca.yml playbook with -e openshift_redeploy_openshift_ca=true:
ansible-playbook -i openshift-ansible/playbooks/inventory.ini openshift-ansible/playbooks/openshift-master/redeploy-openshift-ca.yml -e openshift_redeploy_openshift_ca=true
But this play too fails at the same task as earlier where it is waiting for master-api to be running.
The master-api docker logs shows:
.
.
I1202 18:02:55.930375 1 plugins.go:84] Registered admission plugin "SecurityContextDeny"
I1202 18:02:55.930387 1 plugins.go:84] Registered admission plugin "ServiceAccount"
I1202 18:02:55.930396 1 plugins.go:84] Registered admission plugin "DefaultStorageClass"
I1202 18:02:55.930408 1 plugins.go:84] Registered admission plugin "PersistentVolumeClaimResize"
I1202 18:02:55.930418 1 plugins.go:84] Registered admission plugin "StorageObjectInUseProtection"
F1202 18:03:25.933354 1 start_api.go:68] dial tcp 167.254.204.228:2379: connect: connection refused
The etcd docker logs shows:
2020-12-02 18:05:14.459240 I | embed: ready to serve client requests
2020-12-02 18:05:14.459730 I | embed: serving client requests on 167.254.204.228:2379
WARNING: 2020/12/02 18:05:14 Failed to dial 167.254.204.228:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.