Irregular Docker restarts receiving SIGNAL 15 with unknown origin

246 views Asked by At

I am experiencing irregular Signal 15 (SIGTERM) terminations of a MongoDB container (mongo:4.2.1-bionic) after which the node express container is unable to connect to MongoDB after restarting (the container policy is restart: always).

There is no orchestration used, just docker-compose with restart policies.

These terminations seem to occur under load conditions, but the exact cause of these signals remain a mystery to me. It seems dockerd itself is receiving the SIGTERM but I can't figure out where the hell it comes from.


Steps taken so far:

  • Checked MongoDB container logs, which showed that the container received a Signal 15 but did not provide information about the source of the signal:

    2023-10-14T11:54:18.185+0000 I  CONTROL  [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
    
  • Examined log file /var/log/syslog but found no useful information around the time of the signal 15.

  • Examined system logs with systemctl -u docker.service around the time of the Signal 15 but found no apparent causes, they only show:

    Oct 14 11:54:18 server-dev dockerd[38876]: time="2023-10-14T13:54:18.154260979+02:00" level=info msg="Processing signal 'terminated'"
    Oct 14 11:54:18 server-dev systemd[1]: Stopping Docker Application Container Engine...
    
  • Assured that system resource utilization (CPU, RAM, disk space) is normal.

  • Investigated firewall rules and found them to be correctly configured.

I attempted finding the culprit using auditd by inserting the rule -a always,exit -F arch=b64 -S kill -F a0=15 -k container-sigterm-signal into /etc/audit/audit.rules and then using ausearch -k container-sigterm-signal. But this only returns entries like:

----
time->Wed Oct 11 09:57:42 2023
type=CONFIG_CHANGE msg=audit(1697011062.804:17): auid=4294967295 ses=4294967295 op=add_rule key="container-sigterm-signal" list=4 res=1
----
time->Sat Oct 14 13:54:52 2023
type=CONFIG_CHANGE msg=audit(1697284492.872:17): auid=4294967295 ses=4294967295 op=add_rule key="container-sigterm-signal" list=4 res=1

I used pgrep dockerd the find the process id of Docker and then utilizing strace with nohup strace -p 38876 -o /home/web/strace.log &, but if I check this after a restart occured, this file only contains:

futex(0x55e9b2a69e88, FUTEX_WAIT_PRIVATE, 0, NULL <detached ...>

What other steps or troubleshooting methods can I use to identify the source of this Signal 15 issue?

0

There are 0 answers