I have set up Kubernetes HA cluster(stacked ETCD) using Kubeadm. When I deliberately shut down one master node the whole cluster goes down and I get error as :
[vagrant@k8s-master01 ~]$ kubectl get nodes
Error from server: etcdserver: request timed out
I am using Nginx as LB to load balance Kubeapi
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master01 Ready master 27d v1.19.2 192.168.30.5 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 docker://19.3.11
k8s-master02 Ready master 27d v1.19.2 192.168.30.6 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 docker://19.3.11
k8s-worker01 Ready <none> 27d v1.19.2 192.168.30.10 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 docker://19.3.11
k8s-worker02 Ready <none> 27d v1.19.2 192.168.30.11 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 docker://19.3.11
[vagrant@k8s-master01 ~]$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-f9fd979d6-wkknl 0/1 Running 9 27d
coredns-f9fd979d6-wp854 1/1 Running 8 27d
etcd-k8s-master01 1/1 Running 46 27d
etcd-k8s-master02 1/1 Running 10 27d
kube-apiserver-k8s-master01 1/1 Running 60 27d
kube-apiserver-k8s-master02 1/1 Running 13 27d
kube-controller-manager-k8s-master01 1/1 Running 20 27d
kube-controller-manager-k8s-master02 1/1 Running 15 27d
kube-proxy-7vn9l 1/1 Running 7 26d
kube-proxy-9kjrj 1/1 Running 7 26d
kube-proxy-lbmkz 1/1 Running 8 27d
kube-proxy-ndbp5 1/1 Running 9 27d
kube-scheduler-k8s-master01 1/1 Running 20 27d
kube-scheduler-k8s-master02 1/1 Running 15 27d
weave-net-77ck8 2/2 Running 21 26d
weave-net-bmpsf 2/2 Running 24 27d
weave-net-frchk 2/2 Running 27 27d
weave-net-zqjzf 2/2 Running 22 26d
[vagrant@k8s-master01 ~]$
Nginx Config :
stream {
upstream apiserver_read {
server 192.168.30.5:6443;
server 192.168.30.6:6443;
}
server {
listen 6443;
proxy_pass apiserver_read;
}
}
Nginx logs :
2020/10/19 09:12:01 [error] 1215#0: *12460 no live upstreams while connecting to upstream, client: 192.168.30.11, server: 0.0.0.0:6443, upstream: "apiserver_read", bytes from/to client:0/0, bytes from/to upstream:0/0
2020/10/19
2020/10/19 09:12:01 [error] 1215#0: *12465 no live upstreams while connecting to upstream, client: 192.168.30.5, server: 0.0.0.0:6443, upstream: "apiserver_read", bytes from/to client:0/0, bytes from/to upstream:0/0
2020/10/19 09:12:02 [error] 1215#0: *12466 no live upstreams while connecting to upstream, client: 192.168.30.10, server: 0.0.0.0:6443, upstream: "apiserver_read", bytes from/to client:0/0, bytes from/to upstream:0/0
2020/10/19 09:12:02 [error] 1215#0: *12467 no live upstreams while connecting to upstream, client: 192.168.30.11, server: 0.0.0.0:6443, upstream: "apiserver_read", bytes from/to client:0/0, bytes from/to upstream:0/0
2020/10/19 09:12:02 [error] 1215#0: *12468 no live upstreams while connecting to upstream, client: 192.168.30.5, server: 0.0.0.0:6443, upstream: "apiserver_read", bytes from/to client:0/0, bytes from/to upstream:0/0
The reason why
ETCD
times out is because it is a distributed key-value database and it needs quorum to be healthy. This basically means that all members of aETCD
cluster vote on certain decisions and the majority decides what to do. When you have 3 nodes you can always lose 1 as 2 nodes are still a majorityThe problem of having 2 nodes is that when 1 goes down the last
ETCD
node waits for a majority vote before deciding anything, which will never happen.This is why you always need an unequal number of master nodes on a
Kubernetes cluster
.