how to quickly fail the Kubernetes Readiness probe?

1.2k views Asked by At

Incase a pod goes down in my cluster, it takes around 15secs or more to determine the failure by readiness probe logic, which is not accepted because of call failure (since kubernetes is not identified the pod failure so it will send the traffic to the failed pod / I mean the failed pod is still in the cluster-P service endpoint).

Please suggest here, how to fail the readiness probe immediately or how to remove the endpoint immediately in case of failure, without much reduce the periodSeconds to below 5secs.

Below is my configuration:

initialDelaySeconds:90s
periodSeconds:5s
timeoutSeconds:2s
successThreshold:<default>
failureThreshold:<default>

Thanking in advance.

1

There are 1 answers

2
Wytrzymały Wiktor On

What you can do is to adjust you probe's configuration in order to meet you requirements:

Probes have a number of fields that you can use to more precisely control the behavior of liveness and readiness checks:

  • initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 0 seconds. Minimum value is 0.

  • periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.

  • timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.

  • successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. Minimum value is 1.

  • failureThreshold: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness probe means restarting the container. In case of readiness probe the Pod will be marked Unready. Defaults to 3. Minimum value is 1.

You haven't specified the failureThreshold so it defaults to 3. The values you are currently using would take ~15-20 seconds to consider the pod as failed and restart it.

If you set the minimal values for periodSeconds, timeoutSeconds, successThreshold and failureThreshold you can expect more frequent checks and faster pod recreations.