I installed kubernetes (v26) cluster on ubuntu 22.04.
- I am able to launch nginx on master node and run curl against it, but hello-world-deployment doesn't work. It is pending forever
- it appears the apiserver, controller-manager and scheduler have issues connecting to each other. Every 10 minutes or so kube-scheduler and kube-controller-manager restarts
- unable to query etcd using etcdctl
Appreciate if anyone can provide possible fixes for these problems. Let me know if any information is needed from the cluster.
Thanks!
$ sudo etcdctl --endpoints=https://127.0.0.1:2379 --cert-file=/etc/kubernetes/pki/etcd/server.crt --key-file=/etc/kubernetes/pki/etcd/server.key --ca-file=/etc/kubernetes/pki/etcd/ca.crt cluster-health
cluster may be unhealthy: failed to list members
Error: unexpected status code 404
tcp 0 0 127.0.0.1:2379 0.0.0.0:* LISTEN -
tcp 0 0 192.168.1.108:2379 0.0.0.0:* LISTEN -
tcp 0 0 192.168.1.108:2380 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:54950 127.0.0.1:2379 ESTABLISHED -
tcp 0 0 127.0.0.1:2379 127.0.0.1:55226 ESTABLISHED -
tcp 0 0 127.0.0.1:2379 127.0.0.1:54952 ESTABLISHED -
tcp 0 0 127.0.0.1:55272 127.0.0.1:2379 ESTABLISHED -
tcp 0 0 127.0.0.1:55068 127.0.0.1:2379 ESTABLISHED -
tcp 0 0 127.0.0.1:2379 127.0.0.1:54996 ESTABLISHED -
tcp 0 0 127.0.0.1:2379 127.0.0.1:54872 ESTABLISHED -
tcp 0 0 127.0.0.1:2379 127.0.0.1:54898 ESTABLISHED -
tcp 0 0 127.0.0.1:2379 127.0.0.1:54750 ESTABLISHED -
tcp 0 0 192.168.1.108:54406 192.168.1.108:2379 ESTABLISHED -
$ kubelet --version
Kubernetes v1.26.0
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:57:06Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
$ k get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master-k8s.sadhanapath.com Ready control-plane 29h v1.26.0 192.168.1.108 <none> Ubuntu 22.04.1 LTS 5.15.0-58-generic cri-o://1.25.2
But I am unable to figure out why certain errors are occurring and how to fix them. Appreciate some insights into the problem and possible solutions.
$ k get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default hello-world-deployment-8679c476ff-jrc5j 0/1 Pending 0 96m
default hello-world-deployment-8679c476ff-snftx 0/1 Pending 0 96m
default my-pod 1/1 Running 0 130m
kube-flannel kube-flannel-ds-rv72q 1/1 Running 0 29h
kube-system coredns-787d4945fb-l5g9l 1/1 Running 0 29h
kube-system coredns-787d4945fb-zvblc 1/1 Running 0 29h
kube-system etcd-master-k8s.sadhanapath.com 1/1 Running 7 29h
kube-system kube-apiserver-master-k8s.sadhanapath.com 1/1 Running 2 29h
kube-system kube-controller-manager-master-k8s.sadhanapath.com 1/1 Running 280 (115s ago) 29h
kube-system kube-proxy-kd2tl 1/1 Running 0 29h
kube-system kube-scheduler-master-k8s.sadhanapath.com 1/1 Running 360 (112s ago) 29h
kube-system metrics-server-6bf7778f96-xfrq4 0/1 Pending 0 96m
my-pod (nginx)
~~~~~~~~~~~~~~~~
$ curl http://192.168.3.96
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Errors I noticed:
kube-apiserver log:
~~~~~~~~~~~~~~~~~~~~
I0117 16:08:30.322251 1 shared_informer.go:280] Caches are synced for garbage collector
I0117 16:08:30.376084 1 shared_informer.go:280] Caches are synced for garbage collector
I0117 16:08:30.376107 1 garbagecollector.go:163] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
E0117 16:14:32.331089 1 leaderelection.go:330] error retrieving resource lock kube-system/kube-controller-manager: Get "https://192.168.1.108:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0117 16:14:37.330451 1 leaderelection.go:330] error retrieving resource lock kube-system/kube-controller-manager: Get "https://192.168.1.108:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": context deadline exceeded
I0117 16:14:37.330505 1 leaderelection.go:283] failed to renew lease kube-system/kube-controller-manager: timed out waiting for the condition
E0117 16:14:37.330543 1 controllermanager.go:294] "leaderelection lost"
...
I0116 10:51:15.888114 1 alloc.go:327] "allocated clusterIPs" service="kube-system/kube-dns" clusterIPs=map[IPv4:192.168.4.10]
I0116 10:51:15.919529 1 controller.go:615] quota admission added evaluator for: daemonsets.apps
I0116 10:51:26.347486 1 controller.go:615] quota admission added evaluator for: replicasets.apps
I0116 10:51:27.150749 1 controller.go:615] quota admission added evaluator for: controllerrevisions.apps
E0116 10:55:10.060857 1 authentication.go:63] "Unable to authenticate the request" err="[invalid bearer token, context canceled]"
E0116 10:55:10.060935 1 writers.go:122] apiserver was unable to write a JSON response: http: Handler timeout
E0116 10:55:10.060965 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
E0116 10:55:10.062138 1 writers.go:135] apiserver was unable to write a fallback JSON response: http: Handler timeout
E0116 10:55:10.063299 1 timeout.go:142] post-timeout activity - time-elapsed: 2.437285ms, GET "/api" result: <nil>
{"level":"warn","ts":"2023-01-16T10:55:10.313Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0004d8540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
E0116 10:55:10.313501 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}: context canceled
E0116 10:55:10.313566 1 writers.go:122] apiserver was unable to write a JSON response: http: Handler timeout
{"level":"warn","ts":"2023-01-16T10:55:10.313Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0004d8540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
E0116 10:55:10.313616 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}: context canceled
E0116 10:55:10.314562 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
E0116 10:55:10.316213 1 writers.go:135] apiserver was unable to write a fallback JSON response: http: Handler timeout
{"level":"warn","ts":"2023-01-16T10:55:10.317Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0004d8540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
E0116 10:55:10.317381 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}: context canceled
{"level":"warn","ts":"2023-01-16T10:55:10.317Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0004d8540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
E0116 10:55:10.317708 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}: context canceled
E0116 10:55:10.317970 1 timeout.go:142] post-timeout activity - time-elapsed: 5.19228ms, GET "/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler" result: <nil>
E0116 10:55:10.318756 1 writers.go:122] apiserver was unable to write a JSON response: http: Handler timeout
E0116 10:55:10.319894 1 writers.go:122] apiserver was unable to write a JSON response: http: Handler timeout
etcd log:
~~~~~~~~~
{"level":"info","ts":"2023-01-17T16:16:26.767Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":181612138,"revision":119833,"compact-revision":119531}
WARNING: 2023/01/17 16:19:23 [core] grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
{"level":"info","ts":"2023-01-17T16:19:52.743Z","caller":"traceutil/trace.go:171","msg":"trace[2082787816] transaction","detail":"{read_only:false; response_revision:120259; number_of_response:1; }","duration":"112.693292ms","start":"2023-01-17T16:19:52.630Z","end":"2023-01-17T16:19:52.743Z","steps":["trace[2082787816] 'process raft request' (duration: 112.557096ms)"],"step_count":1}
{"level":"info","ts":"2023-01-17T16:19:54.134Z","caller":"traceutil/trace.go:171","msg":"trace[708529753] transaction","detail":"{read_only:false; response_revision:120260; number_of_response:1; }","duration":"118.409797ms","start":"2023-01-17T16:19:54.016Z","end":"2023-01-17T16:19:54.134Z","steps":["trace[708529753] 'process raft request' (duration: 118.295288ms)"],"step_count":1}
{"level":"info","ts":"2023-01-
17T16:19:56.267Z","caller":"traceutil/trace.go:171","msg":"trace[539414652] transaction","detail":"{read_only:false; response_revision:120261; number_of_response:1; }","duration":"126.094165ms","start":"2023-
kube-controller-maneger:
~~~~~~~~~~~~~~~~~~~~~~~~~
I0117 16:25:29.040164 1 shared_informer.go:280] Caches are synced for HPA
I0117 16:25:29.377578 1 shared_informer.go:280] Caches are synced for garbage collector
I0117 16:25:29.379762 1 shared_informer.go:280] Caches are synced for garbage collector
I0117 16:25:29.379789 1 garbagecollector.go:163] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
E0117 16:26:30.231228 1 leaderelection.go:330] error retrieving resource lock kube-system/kube-controller-manager: Get "https://192.168.1.108:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0117 16:26:35.231206 1 leaderelection.go:330] error retrieving resource lock kube-system/kube-controller-manager: Get "https://192.168.1.108:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": context deadline exceeded
I0117 16:26:35.231277 1 leaderelection.go:283] failed to renew lease kube-system/kube-controller-manager: timed out waiting for the condition
E0117 16:26:35.231330 1 controllermanager.go:294] "leaderelection lost"
kube-scheduler:
~~~~~~~~~~~~~~~
I0117 16:28:09.110528 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0117 16:28:09.110542 1 shared_informer.go:273] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0117 16:28:09.211149 1 shared_informer.go:280] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0117 16:28:09.211151 1 shared_informer.go:280] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0117 16:28:09.211261 1 leaderelection.go:248] attempting to acquire leader lease kube-system/kube-scheduler...
I0117 16:28:09.211927 1 shared_informer.go:280] Caches are synced for RequestHeaderAuthRequestController
I0117 16:28:09.223818 1 leaderelection.go:258] successfully acquired lease kube-system/kube-scheduler
E0117 16:28:56.592893 1 leaderelection.go:330] error retrieving resource lock kube-system/kube-scheduler: Get "https://192.168.1.108:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0117 16:29:01.592295 1 leaderelection.go:330] error retrieving resource lock kube-system/kube-scheduler: Get "https://192.168.1.108:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=5s": context deadline exceeded
I0117 16:29:01.592378 1 leaderelection.go:283] failed to renew lease kube-system/kube-scheduler: timed out waiting for the condition
E0117 16:29:02.745104 1 server.go:224] "Leaderelection lost"
I have ufw enabled with ports opened for 6443, 2379-2380,10250:10255, 30000:32767
I disabled ufw completely, yet scheduler and controller-manager restart counts increments and apiserver http timeouts, write failures continues ..
Yet I am able to create another nginx instance and run curl command against it...
$ k get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hello-world-deployment-8679c476ff-jrc5j 0/1 Pending 0 162m <none> <none> <none> <none>
hello-world-deployment-8679c476ff-snftx 0/1 Pending 0 162m <none> <none> <none> <none>
my-pod 1/1 Running 0 3h16m 192.168.3.96 master-k8s.sadhanapath.com <none> <none>
my-pod2 1/1 Running 0 15s 192.168.3.97 master-k8s.sadhanapath.com <none> <none>
$ curl http://192.168.3.97
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Appreciate any suggestions on how to solve this problem
Uninstalled kubernetes completely as shown here ... How to completely uninstall kubernetes
and reinstalled kubernetes from scratch https://www.linuxtechi.com/install-kubernetes-on-ubuntu-22-04/
I don't intend to run etcd cluster for now (https://learnk8s.io/etcd-kubernetes ), but a simple cluster with a couple of nodes, so I may not need etcdctl I suppose ..
Thanks