microk8s containerd - failed to reserve sandbox name

3.1k views Asked by At

Have microk8s running on two nodes. Recently it got into a state where the master node fails to go into Ready status because microk8s.daemon-containerd service fails to start. This started happening after trying to get cert-manager configuration running in the k8s cluster.

As far as I can see cert-manager-webhook pod is running on second node okay.

I have tried microk8s stop/microk8s start. I have even tried microk8s reset at this point but containerd always shows same error.

Outputs:

$ kubectl get node
NAME        STATUS     ROLES    AGE   VERSION
pi-k8s-00   NotReady   <none>   77d   v1.18.6-1+b4f4cb0b7fe3c1
pi-k8s-01   Ready      <none>   77d   v1.19.2-34+37bbd8cebecb60
$ kubectl get pod -n cert-manager
NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-676b755d5f-6bjxv              1/1     Running   0          12m
cert-manager-cainjector-795f67b984-tsmw9   1/1     Running   3          12m
cert-manager-webhook-86c4dcd4b5-bgrmb      1/1     Running   0          12m
$ sudo journalctl -u snap.microk8s.daemon-containerd
...
Oct 17 10:42:33 pi-k8s-00 microk8s.daemon-containerd[44363]: time="2020-10-17T10:42:33.848409047Z" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve sandbox name \"cert-manager-webhook>
Oct 17 10:42:33 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Main process exited, code=exited, status=1/FAILURE
Oct 17 10:42:33 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Failed with result 'exit-code'.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Scheduled restart job, restart counter is at 5.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: Stopped Service for snap application microk8s.daemon-containerd.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Start request repeated too quickly.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Failed with result 'exit-code'.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: Failed to start Service for snap application microk8s.daemon-containerd.
$ uname -a
Linux pi-k8s-00 5.4.0-1021-raspi #24-Ubuntu SMP PREEMPT Mon Oct 5 09:59:23 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

How can I get the master node back in a good running/ready state?

--- UPDATE ---

Output:

$ less /var/snap/microk8s/current/inspection-report/snap.microk8s.daemon-containerd/journal.log
Oct 18 14:48:03 pi-k8s-00 microk8s.daemon-containerd[239043]: time="2020-10-18T14:48:03.936439781Z" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve sandbox name \"cert-manager-webhook-64b9b4fdfd-9d6tm_cert-manager_81fb08ac-7e87-42bd-9123-b0b8b098fe50_3\": name \"cert-manager-webhook-64b9b4fdfd-9d6tm_cert-manager_81fb08ac-7e87-42bd-9123-b0b8b098fe50_3\" is reserved for \"149b0aa92e3eb042f87353ead44a7247e756c8071f804bfbec3b781a5565e52c\""

This last log shows that the sandbox name is reserved for a given id.

What id would that be? And where do I go and what should one do to free things up?

Looking through comments in 'failed to reserve sandbox name' error after hard reboot #1014 I tried:

$ sudo ctr -n=k8s.io containers info 149b0aa92e3eb042f87353ead44a7247e756c8071f804bfbec3b781a5565e52c
ctr: container "149b0aa92e3eb042f87353ead44a7247e756c8071f804bfbec3b781a5565e52c" in namespace "k8s.io": not found

But as can be seen from the output no container with that id exists?

1

There are 1 answers

0
Going Bananas On

It seems the containerd data had got corrupted and so the way to resolve this issue was to recreate the containerd data by doing:

$ microk8s.stop
$ mv /var/snap/microk8s/common/var/lib/containerd /var/snap/microk8s/common/var/lib/_containerd
$ microk8s.start

Kubernetes master node is once again showing with status Ready:

$ kubectl get node
NAME        STATUS   ROLES    AGE   VERSION
pi-k8s-00   Ready    <none>   84d   v1.19.2-34+37bbd8cebecb60
pi-k8s-01   Ready    <none>   84d   v1.19.2-34+37bbd8cebecb60

See my post on microk8s github issues page Failed to Reserve Sandbox Name for more details.