kubelet service can't access kube-apiserver at port 6443 with https due to error net/http: TLS handshake timeout

5.8k views Asked by At

I am provisioning a workload cluster with one control plane node and one worker node on top of openstack via Cluster API. However the kubernetes control plane failed to start properly in the control plane node.

I can see the kube-apiserver keeps exiting and recreating:

ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ sudo crictl --runtime-endpoint /run/containerd/containerd.sock ps -a
CONTAINER           IMAGE               CREATED              STATE               NAME                      ATTEMPT             POD ID
a729fdd387b0a       90d27391b7808       About a minute ago   Running             kube-apiserver            74                  88de61a0459f6
38b54a71cb0aa       90d27391b7808       3 minutes ago        Exited              kube-apiserver            73                  88de61a0459f6
24573a1c5adc5       b0f1517c1f4bb       18 minutes ago       Running             kube-controller-manager   4                   cc113aaae13b5
a2072b64cca1a       b0f1517c1f4bb       29 minutes ago       Exited              kube-controller-manager   3                   cc113aaae13b5
f26a531972518       d109c0821a2b9       5 hours ago          Running             kube-scheduler            1                   df1d15fd61a8f
a91b4c0ce9e27       303ce5db0e90d       5 hours ago          Running             etcd                      1                   16e1f0f5bb543
1565a1a7dedec       303ce5db0e90d       5 hours ago          Exited              etcd                      0                   16e1f0f5bb543
35ae23eb64f11       d109c0821a2b9       5 hours ago          Exited              kube-scheduler            0                   df1d15fd61a8f
ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$

From the kube-apiserver container's log I can see "http: TLS handshake error from 172.24.4.159:50812: EOF":

ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ sudo crictl --runtime-endpoint /run/containerd/containerd.sock logs -f a729fdd387b0a
Flag --insecure-port has been deprecated, This flag will be removed in a future version.
I0416 20:32:25.730809       1 server.go:596] external host was not specified, using 10.6.0.9
I0416 20:32:25.744220       1 server.go:150] Version: v1.17.3
......
......
I0416 20:33:46.816189       1 dynamic_cafile_content.go:166] Starting request-header::/etc/kubernetes/pki/front-proxy-ca.crt
I0416 20:33:46.816832       1 dynamic_cafile_content.go:166] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt
I0416 20:33:46.833031       1 dynamic_serving_content.go:129] Starting serving-cert::/etc/kubernetes/pki/apiserver.crt::/etc/kubernetes/pki/apiserver.key
I0416 20:33:46.853958       1 secure_serving.go:178] Serving securely on [::]:6443
......
......
I0416 20:33:51.784715       1 log.go:172] http: TLS handshake error from 172.24.4.159:60148: EOF
I0416 20:33:51.786804       1 log.go:172] http: TLS handshake error from 172.24.4.159:60150: EOF
I0416 20:33:51.788984       1 log.go:172] http: TLS handshake error from 172.24.4.159:60158: EOF
I0416 20:33:51.790695       1 log.go:172] http: TLS handshake error from 172.24.4.159:60210: EOF
I0416 20:33:51.792577       1 log.go:172] http: TLS handshake error from 172.24.4.159:60214: EOF
I0416 20:33:51.793861       1 log.go:172] http: TLS handshake error from 172.24.4.159:60202: EOF
I0416 20:33:51.805506       1 log.go:172] http: TLS handshake error from 10.6.0.9:35594: EOF
I0416 20:33:51.806056       1 log.go:172] http: TLS handshake error from 172.24.4.159:60120: EOF
......

From syslog I can see apiserver serving cert is signed for IP 172.24.4.159:

ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ grep "apiserver serving cert is signed for DNS names" /var/log/syslog 
Apr 16 15:25:56 ubu1910-medflavor-nolb3-control-plane-nh4hf cloud-init[652]: [certs] apiserver serving cert is signed for DNS names [ubu1910-medflavor-nolb3-control-plane-nh4hf kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.6.0.9 172.24.4.159]

And from syslog I can also see kubelet service can't access the apiserver due to "net/http: TLS handshake timeout":

ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ tail -F /var/log/syslog 
Apr 16 19:36:18 ubu1910-medflavor-nolb3-control-plane-nh4hf kubelet[1504]: E0416 19:36:18.596206    1504 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: Get https://172.24.4.159:6443/apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&resourceVersion=0: net/http: TLS handshake timeout
Apr 16 19:36:19 ubu1910-medflavor-nolb3-control-plane-nh4hf containerd[568]: time="2021-04-16T19:36:19.202346090Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Apr 16 19:36:19 ubu1910-medflavor-nolb3-control-plane-nh4hf kubelet[1504]: E0416 19:36:19.274089    1504 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Apr 16 19:36:20 ubu1910-medflavor-nolb3-control-plane-nh4hf kubelet[1504]: W0416 19:36:20.600457    1504 status_manager.go:530] Failed to get status for pod "kube-apiserver-ubu1910-medflavor-nolb3-control-plane-nh4hf_kube-system(24ec7abb1b94172adb053cf6fdd1648c)": Get https://172.24.4.159:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-ubu1910-medflavor-nolb3-control-plane-nh4hf: net/http: TLS handshake timeout
Apr 16 19:36:24 ubu1910-medflavor-nolb3-control-plane-nh4hf containerd[568]: time="2021-04-16T19:36:24.336699210Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Apr 16 19:36:24 ubu1910-medflavor-nolb3-control-plane-nh4hf kubelet[1504]: E0416 19:36:24.379374    1504 controller.go:135] failed to ensure node lease exists, will retry in 7s, error: Get https://172.24.4.159:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ubu1910-medflavor-nolb3-control-plane-nh4hf?timeout=10s: context deadline exceeded
......
......

I also tried to access the apiserver with curl, and I see:

ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ curl http://172.24.4.159:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-ubu1910-medflavor-nolb3-control-plane-nh4hf
Client sent an HTTP request to an HTTPS server.

ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$ curl https://172.24.4.159:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-ubu1910-medflavor-nolb3-control-plane-nh4hf
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
ubuntu@ubu1910-medflavor-nolb3-control-plane-nh4hf:~$

Is there something wrong with kube-apiserver's certificate? Any idea how can I continue troubleshooting?

1

There are 1 answers

0
welcomeboredom On

If you want to see details of your kube-api SSL cert you can use curl -k -v https://172.24.4.159:6443 or openssl s_client -connect 172.24.4.159:6443

You didn't mention how you're provisioning your certificates. SSL in kubernetes is complicated beast and setting up certificates and all communication manually can be very painful. That's the reason why people use kubeadm nowadays.

TLDR: You must ensure all the certificates are signed by /etc/kubernetes/pki/ca.crt.

Since you're mentioning 'single node' I assume Kubelet is running as SystemD unit on the same server? How is that kube-api container launched? By Kubelet process itself because you have pod definitions in /etc/kubernetes/manifests?

There are actually two ways of communication between kubelet and kube-api and they are both used at same time:

  1. kubelet connects and authenticates to kube-api using information from --kubeconfig=/etc/kubernetes/kubelet.conf parameter (you can check by ps -aux | grep kubelet). Inside the file you'll see the connection string, CA cert and client cert + key). Kubelet presents client certificate from the file and verifies kube-api server cert by the CA from same file. kube-api verifies client cert using CA defined in its own option --client-ca-file
  2. kube-api connects to kubelet using --kubelet-client-certificate and --kubelet-client-key options. This is probably not where the problem is.

Since you can see SSL error on kube-api side and not on kubelet side. I assume there is a problem with the communication described in point n.1. kubelet connects and authenticates to kube-api. The error is in kube-api logs so I'd say kube-api has problem verifying kubelet client certificate. So check it inside --kubeconfig=/etc/kubernetes/kubelet.conf. You can base64 decode it and show details by openssl or some online SSL cert checker. Most important part is that it must be signed by CA file defined in kube-api option --client-ca-file

This all requires a lot of effort to be honest and simplest approach you can take is throw everything away and use kubeadm to bootstrap single node cluster:

  1. clean your server from all the mess
  2. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
  3. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/