Apache Ignite cluster nodes getting down abruptly

37 views Asked by At

We have an Apache Ignite (version - 2.7.6, OS - RHEL 7.x) setup in one of our higher environments. Suddenly we are observing that one or two server nodes getting down abruptly. Requests to the cluster are coming from spring boot applications running in our Kubernetes cluster.

Interesting is that our other Ignite cluster is running smoothly with the same setup and serving requests from another Kubernetes cluster.

Both the Kubernetes-Ignite cluster setups are situated at different geographical locations.

The following are error messages printed in server node logs.

[ERROR][sys-stripe-117-#118%EnterpriseNGPR%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.ClassCastException: o.a.i.i.processors.cache.distributed.near.GridNearGetRequest cannot be cast to o.a.i.i.GridJobExecuteRequest]]

java.lang.ClassCastException: org.apache.ignite.internal.processors.cache.distributed.near.GridNearGetRequest cannot be cast to org.apache.ignite.internal.GridJobExecuteRequest
    at org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1923) ~[ignite-core-2.7.6.jar:2.7.6]
    at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569) ~[ignite-core-2.7.6.jar:2.7.6]
    at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197) ~[ignite-core-2.7.6.jar:2.7.6]
    at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127) ~[ignite-core-2.7.6.jar:2.7.6]
    at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093) ~[ignite-core-2.7.6.jar:2.7.6]
    at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:505) [ignite-core-2.7.6.jar:2.7.6]
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.7.6.jar:2.7.6]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_351]

We are not able to find out what could be the possible reasons for this error. Any help is greatly appreciated.

Since our other Ignite cluster with the same configuration is running smoothly, we didn't do much except restart the cluster.

0

There are 0 answers