I am provisioning a workload cluster with one control plane node and one worker node on top of openstack via Cluster API. However the kubernetes control plane failed to start properly in the control plane node.
I can see the kube-apiserver keeps exiting and recreating:
ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ sudo crictl --runtime-endpoint /run/containerd/containerd.sock ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
a729fdd387b0a 90d27391b7808 About a minute ago Running kube-apiserver 74 88de61a0459f6
38b54a71cb0aa 90d27391b7808 3 minutes ago Exited kube-apiserver 73 88de61a0459f6
24573a1c5adc5 b0f1517c1f4bb 18 minutes ago Running kube-controller-manager 4 cc113aaae13b5
a2072b64cca1a b0f1517c1f4bb 29 minutes ago Exited kube-controller-manager 3 cc113aaae13b5
f26a531972518 d109c0821a2b9 5 hours ago Running kube-scheduler 1 df1d15fd61a8f
a91b4c0ce9e27 303ce5db0e90d 5 hours ago Running etcd 1 16e1f0f5bb543
1565a1a7dedec 303ce5db0e90d 5 hours ago Exited etcd 0 16e1f0f5bb543
35ae23eb64f11 d109c0821a2b9 5 hours ago Exited kube-scheduler 0 df1d15fd61a8f
ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$
From the kube-apiserver container's log I can see "http: TLS handshake error from 172.24.4.159:50812: EOF":
ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ sudo crictl --runtime-endpoint /run/containerd/containerd.sock logs -f a729fdd387b0a
Flag --insecure-port has been deprecated, This flag will be removed in a future version.
I0416 20:32:25.730809 1 server.go:596] external host was not specified, using 10.6.0.9
I0416 20:32:25.744220 1 server.go:150] Version: v1.17.3
......
......
I0416 20:33:46.816189 1 dynamic_cafile_content.go:166] Starting request-header::/etc/kubernetes/pki/front-proxy-ca.crt
I0416 20:33:46.816832 1 dynamic_cafile_content.go:166] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt
I0416 20:33:46.833031 1 dynamic_serving_content.go:129] Starting serving-cert::/etc/kubernetes/pki/apiserver.crt::/etc/kubernetes/pki/apiserver.key
I0416 20:33:46.853958 1 secure_serving.go:178] Serving securely on [::]:6443
......
......
I0416 20:33:51.784715 1 log.go:172] http: TLS handshake error from 172.24.4.159:60148: EOF
I0416 20:33:51.786804 1 log.go:172] http: TLS handshake error from 172.24.4.159:60150: EOF
I0416 20:33:51.788984 1 log.go:172] http: TLS handshake error from 172.24.4.159:60158: EOF
I0416 20:33:51.790695 1 log.go:172] http: TLS handshake error from 172.24.4.159:60210: EOF
I0416 20:33:51.792577 1 log.go:172] http: TLS handshake error from 172.24.4.159:60214: EOF
I0416 20:33:51.793861 1 log.go:172] http: TLS handshake error from 172.24.4.159:60202: EOF
I0416 20:33:51.805506 1 log.go:172] http: TLS handshake error from 10.6.0.9:35594: EOF
I0416 20:33:51.806056 1 log.go:172] http: TLS handshake error from 172.24.4.159:60120: EOF
......
From syslog I can see apiserver serving cert is signed for IP 172.24.4.159:
ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ grep "apiserver serving cert is signed for DNS names" /var/log/syslog
Apr 16 15:25:56 ubu1910-medflavor-nolb3-control-plane-nh4hf cloud-init[652]: [certs] apiserver serving cert is signed for DNS names [ubu1910-medflavor-nolb3-control-plane-nh4hf kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.6.0.9 172.24.4.159]
And from syslog I can also see kubelet service can't access the apiserver due to "net/http: TLS handshake timeout":
ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ tail -F /var/log/syslog
Apr 16 19:36:18 ubu1910-medflavor-nolb3-control-plane-nh4hf kubelet[1504]: E0416 19:36:18.596206 1504 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: Get https://172.24.4.159:6443/apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&resourceVersion=0: net/http: TLS handshake timeout
Apr 16 19:36:19 ubu1910-medflavor-nolb3-control-plane-nh4hf containerd[568]: time="2021-04-16T19:36:19.202346090Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Apr 16 19:36:19 ubu1910-medflavor-nolb3-control-plane-nh4hf kubelet[1504]: E0416 19:36:19.274089 1504 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Apr 16 19:36:20 ubu1910-medflavor-nolb3-control-plane-nh4hf kubelet[1504]: W0416 19:36:20.600457 1504 status_manager.go:530] Failed to get status for pod "kube-apiserver-ubu1910-medflavor-nolb3-control-plane-nh4hf_kube-system(24ec7abb1b94172adb053cf6fdd1648c)": Get https://172.24.4.159:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-ubu1910-medflavor-nolb3-control-plane-nh4hf: net/http: TLS handshake timeout
Apr 16 19:36:24 ubu1910-medflavor-nolb3-control-plane-nh4hf containerd[568]: time="2021-04-16T19:36:24.336699210Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Apr 16 19:36:24 ubu1910-medflavor-nolb3-control-plane-nh4hf kubelet[1504]: E0416 19:36:24.379374 1504 controller.go:135] failed to ensure node lease exists, will retry in 7s, error: Get https://172.24.4.159:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ubu1910-medflavor-nolb3-control-plane-nh4hf?timeout=10s: context deadline exceeded
......
......
I also tried to access the apiserver with curl, and I see:
ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ curl http://172.24.4.159:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-ubu1910-medflavor-nolb3-control-plane-nh4hf
Client sent an HTTP request to an HTTPS server.
ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ curl https://172.24.4.159:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-ubu1910-medflavor-nolb3-control-plane-nh4hf
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$
Is there something wrong with kube-apiserver's certificate? Any idea how can I continue troubleshooting?
If you want to see details of your kube-api SSL cert you can use
curl -k -v https://172.24.4.159:6443
oropenssl s_client -connect 172.24.4.159:6443
You didn't mention how you're provisioning your certificates. SSL in kubernetes is complicated beast and setting up certificates and all communication manually can be very painful. That's the reason why people use
kubeadm
nowadays.TLDR: You must ensure all the certificates are signed by
/etc/kubernetes/pki/ca.crt
.Since you're mentioning 'single node' I assume Kubelet is running as SystemD unit on the same server? How is that kube-api container launched? By Kubelet process itself because you have pod definitions in
/etc/kubernetes/manifests
?There are actually two ways of communication between
kubelet
andkube-api
and they are both used at same time:kubelet
connects and authenticates tokube-api
using information from--kubeconfig=/etc/kubernetes/kubelet.conf
parameter (you can check byps -aux | grep kubelet
). Inside the file you'll see the connection string, CA cert and client cert + key). Kubelet presents client certificate from the file and verifieskube-api
server cert by the CA from same file.kube-api
verifies client cert using CA defined in its own option--client-ca-file
kube-api
connects tokubelet
using--kubelet-client-certificate
and--kubelet-client-key
options. This is probably not where the problem is.Since you can see SSL error on
kube-api
side and not onkubelet
side. I assume there is a problem with the communication described in point n.1.kubelet
connects and authenticates tokube-api
. The error is inkube-api
logs so I'd saykube-api
has problem verifyingkubelet
client certificate. So check it inside--kubeconfig=/etc/kubernetes/kubelet.conf
. You can base64 decode it and show details by openssl or some online SSL cert checker. Most important part is that it must be signed by CA file defined inkube-api
option--client-ca-file
This all requires a lot of effort to be honest and simplest approach you can take is throw everything away and use
kubeadm
to bootstrap single node cluster: