k8s pod keeps crashing (CrashLoopBackOff), but describe shows "Completed" with code 0?

1.1k views Asked by At

I'm running a node application built on top of AWS's java KCL lib on k8s.

Every 5 minutes or so the container crashe with "CrashLoopBackOff" and restarts - I can't figure out why.

The container logs show no errors and at some point the stream simply ends with:

Stream closed EOF for sol/etl-sol-onchain-tx-parse-6b7d8f4c94-tf8tc (parse)

The pod events show no useful info either, looking like this:

│     State:          Running                                                                               
│       Started:      Sun, 08 May 2022 10:06:36 -0400                                                       
│     Last State:     Terminated                                                                            
│       Reason:       Completed                                                                             
│       Exit Code:    0                                                                                     
│       Started:      Sun, 08 May 2022 09:58:42 -0400                                                       
│       Finished:     Sun, 08 May 2022 10:03:43 -0400                                                       
│     Ready:          True                                                                                  
│     Restart Count:  6

How is it possible that it says "Completed" with exit code 0? The container is a never ending process, it should never complete.

CPU/mem requests are used 25-50% at most.

What else might be causing this? The container is supposed to be using 4-7 threads (not sure if green) - maybe that's the issue? Running it on a M5. large (2 vCPUs, 8gb ram).

2

There are 2 answers

0
Bguess On

I don't think that what you say is accurate :

The container is a never ending process, it should never complete.

In my opinion this is not linked to kubernetes but to your application in the container. Try to execute your container directly one your host (within docker for example) and check the behavior.

0
Blender Fox On

Error code 0 means the application terminated with a success -- although this can be misleading depending on the application. For example, if your code says to exit with code 0 on an error, this is what happens.

Kubernetes deployments will restart containers regardless of how they terminated -- success or failure.

A container is NOT a never-ending process. It can (and does) terminate when required to. For example, when you are running cronjobs or jobs

Going back to your issue. We don't know what your application is doing, other than it is a node app. Is it processing something and then terminating when it finishes its queue?

Try to reproduce the issue by running the container locally using something like:

docker run -it path/to/image

(If you need to mount volumes, do so)

If the container runs, processes, and terminates, do:

echo $?

This will return the error code of the last command, if this returns 0, you have replicated what Kubernetes is seeing.

If this is the case, the two ways you can rectify this:

  1. Change the way the code is written to constantly check for new work so that instead of exiting with zero, it then goes back to check for new work.
  2. Change the way you set up the Kubernetes resources and use either a Job (for a one-time run), or a CronJob (for a scheduled, repeating run). This won't require you to change your code in any way.