Looking at the Hazelcast source code (version 3.2.6), it appears that the only way that a Hazelcast node can spontaneously become inactive and start throwing HazelcastInstanceNotActiveException - other than an application-initiated shutdown, of course - is an out-of-memory condition.
Is that correct? Or are there any other reasons?
Thank you in advance.
I don't know if there are other reasons, but an OOME can certainly lead to a HazelcastInstance shutdown.
If you are running low on memory (70% or more used), the Hazelcast Healthmonitor should start kicking in and should show you periodically all kinds of metrics. Can you check your logging?
We normally run our performance/stress tests with:
-verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log
This way we can see what is happening on the gc level.
PS: It is impossible for HZ to trap all OOME, so we can only see the ones that happen internally; as soon as it is detected the HazelcastInstance is shut down.