Error on RabbitMQ "wal: encountered error during recovery: badarg"

1.2k views Asked by At

I'm using RMQ high available cluster with 3 nodes, version : 3.8.3

Spec :

  • RAM : 4GB
  • CPU : 2CPUs

Intermittently I'm getting following errors and some nodes are crashed with memory alarms.

Application mnesia exited with reason: stopped
wal: encountered error during recovery: badarg

Full log entries :

**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
2020-07-14 01:13:00.914 [warning] <0.328.0> rabbit_sysmon_handler busy_dist_port <0.456.0> [{name,rabbit_alarm},{initial_call,{gen_event,init_it,6}},{erlang,bif_return_trap,2},{message_queue_len,0}] {#Port<0.968>,unknown}
2020-07-14 01:13:02.838 [warning] <0.328.0> rabbit_sysmon_handler busy_dist_port <0.684.0> [{initial_call,{rabbit_prequeue,init,1}},{erts_internal,dsend_continue_trap,1},{message_queue_len,0}] {#Port<0.968>,unknown}
2020-07-14 01:31:34.457 [info] <0.8.0> Log file opened with Lager
2020-07-14 01:31:37.799 [info] <0.8.0> Feature flags: list of feature flags found:
2020-07-14 01:31:37.799 [info] <0.8.0> Feature flags:   [x] drop_unroutable_metric
2020-07-14 01:31:37.799 [info] <0.8.0> Feature flags:   [x] empty_basic_get_metric
2020-07-14 01:31:37.799 [info] <0.8.0> Feature flags:   [x] implicit_default_bindings
2020-07-14 01:31:37.799 [info] <0.8.0> Feature flags:   [x] quorum_queue
2020-07-14 01:31:37.800 [info] <0.8.0> Feature flags:   [x] virtual_host_metadata
2020-07-14 01:31:37.800 [info] <0.8.0> Feature flags: feature flag states written to disk: yes
2020-07-14 01:31:37.910 [info] <0.43.0> Application mnesia exited with reason: stopped
2020-07-14 01:31:38.072 [info] <0.395.0> ra: meta data store initialised. 0 record(s) recovered
2020-07-14 01:31:38.072 [info] <0.402.0> WAL: recovering ["/var/lib/rabbitmq/mnesia/rabbit@rmq-3/quorum/rabbit@rmq-3/00000058.wal"]
2020-07-14 01:31:38.518 [warning] <0.402.0> wal: encountered error during recovery: badarg

In this time I was able to see the system iowait was high,

enter image description here

And I was able to see High TCP errors

enter image description here

What may be the possible reasons for this ?

Any help would be greatly appreciated.

Thanks.

1

There are 1 answers

0
iinuwa On

This doesn't solve the node crashing problem, but according to this Google groups post, the wal: encountered error during recovery: badarg message in 3.8.3 can be ignored:

This error message has no impact at all and will not be printed in 3.8.4

So perhaps that line is a red herring and your problem is elsewhere.