Unable to redeploy the certificates post-expiry in openshift 3.11

2.9k views Asked by At

I have deployed openshift(okd) 3.11 using : https://github.com/openshift/openshift-ansible/tree/release-3.11 I would want to produce a scenario where certificates expire and test how the renewal certificates can be done.

Hence I have set following variables in the inventory as 1 day(so that certificates expire quickly):

openshift_hosted_registry_cert_expire_days=1
openshift_ca_cert_expire_days=1
openshift_master_cert_expire_days=1
etcd_ca_default_days=1

As expected after 1 day the oc commands where not working and master-api, master-etcd pods where in exited state. Now i wanted to renew all the certificates hence i have run the re-deploy certificate play referring to https://docs.openshift.com/container-platform/3.11/install_config/redeploying_certificates.html#redeploying-all-certificates-current-ca

ansible-playbook -i openshift-ansible/playbooks/inventory.ini openshift-ansible/playbooks/redeploy-certificates.yml

But the this ansible play gets aborted with error:

.
.
.
.
TASK [Wait for master to restart] **********************************************************************************************************
skipping: [master.167.254.204.228.nip.io]

TASK [Wait for master API to come back online] *********************************************************************************************
skipping: [master.167.254.204.228.nip.io]

TASK [openshift_control_plane : restart master] ********************************************************************************************
changed: [master.167.254.204.228.nip.io] => (item=api)
changed: [master.167.254.204.228.nip.io] => (item=controllers)

RUNNING HANDLER [openshift_control_plane : verify API server] ******************************************************************************
FAILED - RETRYING: verify API server (120 retries left).
FAILED - RETRYING: verify API server (119 retries left).
.
.
.
FAILED - RETRYING: verify API server (2 retries left).
FAILED - RETRYING: verify API server (1 retries left).
fatal: [master.167.254.204.228.nip.io]: FAILED! => {
    "attempts": 120,
    "changed": false,
    "cmd": [
        "curl",
        "--silent",
        "--tlsv1.2",
        "--max-time",
        "2",
        "--cacert",
        "/etc/origin/master/ca-bundle.crt",
        "https://master.167.254.204.228.nip.io:8443/healthz/ready"
    ],
    "delta": "0:00:00.012426",
    "end": "2020-11-29 22:56:24.445762",
    "rc": 7,
    "start": "2020-11-29 22:56:24.433336"
}

MSG:

non-zero return code


RUNNING HANDLER [openshift_control_plane : verify Local API server] ************************************************************************


Please let me know if im missing out anything while re-deploying certificates or any alternate way where we can renew these certificates.

Update

I have also tried the redeploy-openshift-ca.yml playbook with -e openshift_redeploy_openshift_ca=true:

ansible-playbook -i openshift-ansible/playbooks/inventory.ini openshift-ansible/playbooks/openshift-master/redeploy-openshift-ca.yml -e openshift_redeploy_openshift_ca=true

But this play too fails at the same task as earlier where it is waiting for master-api to be running.

The master-api docker logs shows:

.
.
I1202 18:02:55.930375       1 plugins.go:84] Registered admission plugin "SecurityContextDeny"
I1202 18:02:55.930387       1 plugins.go:84] Registered admission plugin "ServiceAccount"
I1202 18:02:55.930396       1 plugins.go:84] Registered admission plugin "DefaultStorageClass"
I1202 18:02:55.930408       1 plugins.go:84] Registered admission plugin "PersistentVolumeClaimResize"
I1202 18:02:55.930418       1 plugins.go:84] Registered admission plugin "StorageObjectInUseProtection"
F1202 18:03:25.933354       1 start_api.go:68] dial tcp 167.254.204.228:2379: connect: connection refused

The etcd docker logs shows:

2020-12-02 18:05:14.459240 I | embed: ready to serve client requests
2020-12-02 18:05:14.459730 I | embed: serving client requests on 167.254.204.228:2379
WARNING: 2020/12/02 18:05:14 Failed to dial 167.254.204.228:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

0

There are 0 answers