I'm using RMQ high available cluster with 3 nodes, version : 3.8.3
Spec :
- RAM : 4GB
- CPU : 2CPUs
Intermittently I'm getting following errors and some nodes are crashed with memory alarms.
Application mnesia exited with reason: stopped
wal: encountered error during recovery: badarg
Full log entries :
**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
2020-07-14 01:13:00.914 [warning] <0.328.0> rabbit_sysmon_handler busy_dist_port <0.456.0> [{name,rabbit_alarm},{initial_call,{gen_event,init_it,6}},{erlang,bif_return_trap,2},{message_queue_len,0}] {#Port<0.968>,unknown}
2020-07-14 01:13:02.838 [warning] <0.328.0> rabbit_sysmon_handler busy_dist_port <0.684.0> [{initial_call,{rabbit_prequeue,init,1}},{erts_internal,dsend_continue_trap,1},{message_queue_len,0}] {#Port<0.968>,unknown}
2020-07-14 01:31:34.457 [info] <0.8.0> Log file opened with Lager
2020-07-14 01:31:37.799 [info] <0.8.0> Feature flags: list of feature flags found:
2020-07-14 01:31:37.799 [info] <0.8.0> Feature flags: [x] drop_unroutable_metric
2020-07-14 01:31:37.799 [info] <0.8.0> Feature flags: [x] empty_basic_get_metric
2020-07-14 01:31:37.799 [info] <0.8.0> Feature flags: [x] implicit_default_bindings
2020-07-14 01:31:37.799 [info] <0.8.0> Feature flags: [x] quorum_queue
2020-07-14 01:31:37.800 [info] <0.8.0> Feature flags: [x] virtual_host_metadata
2020-07-14 01:31:37.800 [info] <0.8.0> Feature flags: feature flag states written to disk: yes
2020-07-14 01:31:37.910 [info] <0.43.0> Application mnesia exited with reason: stopped
2020-07-14 01:31:38.072 [info] <0.395.0> ra: meta data store initialised. 0 record(s) recovered
2020-07-14 01:31:38.072 [info] <0.402.0> WAL: recovering ["/var/lib/rabbitmq/mnesia/rabbit@rmq-3/quorum/rabbit@rmq-3/00000058.wal"]
2020-07-14 01:31:38.518 [warning] <0.402.0> wal: encountered error during recovery: badarg
In this time I was able to see the system iowait was high,
And I was able to see High TCP errors
What may be the possible reasons for this ?
Any help would be greatly appreciated.
Thanks.
This doesn't solve the node crashing problem, but according to this Google groups post, the
wal: encountered error during recovery: badarg
message in 3.8.3 can be ignored:So perhaps that line is a red herring and your problem is elsewhere.