How to startup old k8s cluster without losing pods and data?

406 views Asked by At

I have a old k8s cluster with 1 master and 2 workers node. It was shutdown for a long time. Now I started it. It had many running pods and deployments. After restart the VM's all k8s command return

The connection to the server 123.70.70.70:6443 was refused - did you specify the right host or port?

What I have done so far? I saw many stack question to fix this error also on git and some other sites. All need kubeadm reset If I reset it I will loss all running pods. I don't know how to start those pods again as it was not deployed by me.

What I want? Is there a way I can make all the pods and nodes up and running without reset? Or even if I reset how can I get all the pods back in there running stage? This cluster was design and set it up by someone else I have no idea about its deployments.

Update Question

When I run docker ps -a | grep api I see this

1a6ba468af3a   123.70.70.70:444/demo/webapikl     "dotnet UserProfileA…"    6 months ago    Exited (255) 22 hours ago                                                                                  k8s_webapikl-image_webapikl-deployment-74648888d5-bjpcj_fwd_13a76dd9-8ce5

Like this many containers. Now any advice to start them please?

I am new to K8s that's why I like to be sure before I do anything.

4

There are 4 answers

2
Kranthiveer Dontineni On

First let me explain about the error since you have restarted your servers or nodes (in kubernetes) if the IP address assigned to these nodes is not static the previous cluster configuration will not work and your cluster enters panic mode refer to this doc for making your cluster up and running.

Now as your cluster is up and running you can use kubectl commands for listing all the services, deployments and namespaces. Take a list of all these outputs and generate xml files and store them for backups.

If you are taking downtime and trying to restart your pods it won’t cause any data loss or application failure this document provides details on how to restart multiple pods at same time, but in general multiple restarts are not suggested, hope this addresses your query and if you can provide why are you planning to restart your cluster I can try to provide a more accurate solution.

1
baatasaari On

Based on what you mentioned, the api server component on the cluster is not working as desired. This can be an issue of api server component starting in itself or it is failing to reach to the etcd component.

Login to the master node, based on the container runtime, check the containers if they are running well especially api server and etcd. If you do not see the containers running, use -a option to see them. For ex: In Docker use,

docker ps -a | grep api 
or 
docker ps -a | grep etcd

Once you find the container, get the logs of the container and it should give you the clue why your api server component is not starting up. Based on what you see, you can update your question with those log entries.

0
AudioBubble On

I really thankful for your time and effort. What worked for me is this stack overflow Answer along with some changes.

In my case when I was running systemctl status kubelet I see this error

devops@kubemaster:/$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: activating (auto-restart) (Result: exit-code) since Wed 2023-01-11 12:51:04 EET; 9s ago
       Docs: https://kubernetes.io/docs/home/
    Process: 188116 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXT>
   Main PID: 188116 (code=exited, status=1/FAILURE)

Kubelet was stuck at activating.

I follow these steps as mentioned answer.

 cd /etc/kubernetes/pki/
$ mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} ~/
$ kubeadm init phase certs all --apiserver-advertise-address <IP>
$ cd /etc/kubernetes/
$ mv {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf} ~/
$ kubeadm init phase kubeconfig all
$ reboot

I also had to delete my etcd .crt and .key files from /etc/kubernetes/pki/etcd/ as mentioned in one comment.

This make kubelet in active state and then I generate new join command and add all the working nodes with master node one by one. Once all nodes were ready I delete the terminating and crashed-loop back pods. Kubeadm created them on different worker nodes. Now all pods working without any issue.

1
Sibtain On

The error you are getting usually comes when the KUBECONFIG environment variable is not exported. Run the following commands as a regular user or run the last command as root.

sudo cp /etc/kubernetes/admin.conf $HOME/
sudo chown $(id -u):$(id -g) $HOME/admin.conf
export KUBECONFIG=$HOME/admin.conf

Refer my SO answer here

Now that you are able to run kubectl commands, you should see any pods that are created as a control plane component or as a workload. Use following command to see the nodes as part of your cluster.

kubectl get nodes

Make sure to verify that all the control plane components are running fine as well

kubectl get pods -n kube-system