I'm new to this forum as well as I'm quite new to Kubernetes. However, I'm having a problem with a GKE cluster - The status of one node is switching to NotReady
a lot. It has probably been happening at least once a day for the last two weeks. And the big problem is that it happens (my website goes down) during day time when I really need it work. When I restart it everything will go back to normal again but that usually takes 20 minutes and I don't have the time (or will) to do that everyday.
When looking at the logs for node, the pattern I can see I that these three messages will always appear in when the node changes its status to NotReady
:
2020-10-06T07:58:03.782923Z curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received
2020-10-06T07:58:03.782923Z Kubelet is unhealthy!
2020-10-06T07:58:21Z Node gke-cluster-default-pool-d02df301-cyfr status is now: NodeNotReady
Does anyone have the slightest idea of what I can do to fix or at least troubleshoot this?
Best regards, Eric
Node
NotReady
can happen for couple reasons, such as:Please refer to this answer to debug. In addition to the above please also check
kubectl get events --all-namespaces
With the little log you have provided ATM it seems there is some operation which is kubelet trying to perform but can't therefore, it is setting the
NotReady
status.Please gather more logs and post to the question which will help to figure out what operation kubelet is failing to perform. if it happens to be a wordpress application (hosted on kubernetes) problem then this link may help