Pod gets into `ContainerCreating` State when the node goes down and it tries to get recreated on other node

842 views Asked by At

I am facing an OpenEBS issue in my K8s Infrastructure which is deployed on AWS EKS with 3 nodes. I am deploying a statefulset of RabbitMQ with one replica. I want to persist the RabbitMQ pod data when the node goes down and the pod restarts on other node. So, I deployed OpenEBS in my cluster. I tried to terminate the node in which the pod was running, So the pod tried to restart in other node. But the pod did not start in other node and remained in ContainerCreating State and showed me following issue -

Events:
  Type     Reason              Age    From                     Message
  ----     ------              ----   ----                     -------
  Normal   Scheduled           2m28s  default-scheduler        Successfully assigned rabbitmq/rabbitmq-0 to ip-10-0-1-132.ap-south-1.compute.internal
  Warning  FailedAttachVolume  2m28s  attachdetach-controller  Multi-Attach error for volume "pvc-b62d32f1-de60-499a-94f8-3c4d1625353d" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount         2m26s  kubelet                  MountVolume.SetUp failed for volume "rabbitmq-token-m99tw" : failed to sync secret cache: timed out waiting for the condition
  Warning  FailedMount         25s    kubelet                  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[configuration data rabbitmq-token-m99tw]: timed out waiting for the condition

Then after sometime(around 5-10 minutes), the rabbitmq pod was able to start but I observed that one cstor-disk-pool pod is failing with following error -

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  7m7s (x3 over 7m9s)  default-scheduler  0/2 nodes are available: 2 node(s) didn't match node selector.
  Warning  FailedScheduling  44s (x8 over 6m14s)  default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.

I described that cstor-disk-pool pod, and the Node-Selectors key still has the value of the old node(which was terminated)Can someone please help me with this issue? Also, We need a way to reduce the time for the rabbitmq pod to restart and get ready properly as we can't afford a downtime of 5-10 minutes of rabbitmq service for our application

1

There are 1 answers

0
Kiran Mova On

For the volumes to sustain one node failures, you will need to have created:

  • 3 cStor pools - one on each node
  • Volume should be configured with 3 replicas so data is replicated to all the 3 nodes.

When one of the node is gone, the volume will be able to serve data from the remaining two replicas.

(For making the pod move faster from failed node to new node, you will have to configure the tolerations appropriately. Default is 5 mins).

The cStor pools are tied to the node on which they are created. This is done to allow re-use of the data from the pool when the node comes back. There are a few solutions depending on the way your nodes and disks are configured, that can help you automate the process of running the cstor pools or move them from failed node to a new node. Could you join the Kubernetes slack #openebs channel or create an issue on the openebs github to take further help?