I am experiencing irregular Signal 15 (SIGTERM) terminations of a MongoDB container (mongo:4.2.1-bionic
) after which the node express container is unable to connect to MongoDB after restarting (the container policy is restart: always
).
There is no orchestration used, just docker-compose with restart policies.
These terminations seem to occur under load conditions, but the exact cause of these signals remain a mystery to me. It seems dockerd itself is receiving the SIGTERM but I can't figure out where the hell it comes from.
Steps taken so far:
Checked MongoDB container logs, which showed that the container received a Signal 15 but did not provide information about the source of the signal:
2023-10-14T11:54:18.185+0000 I CONTROL [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
Examined log file
/var/log/syslog
but found no useful information around the time of the signal 15.Examined system logs with
systemctl -u docker.service
around the time of the Signal 15 but found no apparent causes, they only show:Oct 14 11:54:18 server-dev dockerd[38876]: time="2023-10-14T13:54:18.154260979+02:00" level=info msg="Processing signal 'terminated'" Oct 14 11:54:18 server-dev systemd[1]: Stopping Docker Application Container Engine...
Assured that system resource utilization (CPU, RAM, disk space) is normal.
Investigated firewall rules and found them to be correctly configured.
I attempted finding the culprit using auditd by inserting the rule -a always,exit -F arch=b64 -S kill -F a0=15 -k container-sigterm-signal
into /etc/audit/audit.rules
and then using ausearch -k container-sigterm-signal
. But this only returns entries like:
----
time->Wed Oct 11 09:57:42 2023
type=CONFIG_CHANGE msg=audit(1697011062.804:17): auid=4294967295 ses=4294967295 op=add_rule key="container-sigterm-signal" list=4 res=1
----
time->Sat Oct 14 13:54:52 2023
type=CONFIG_CHANGE msg=audit(1697284492.872:17): auid=4294967295 ses=4294967295 op=add_rule key="container-sigterm-signal" list=4 res=1
I used pgrep dockerd
the find the process id of Docker and then utilizing strace with nohup strace -p 38876 -o /home/web/strace.log &
, but if I check this after a restart occured, this file only contains:
futex(0x55e9b2a69e88, FUTEX_WAIT_PRIVATE, 0, NULL <detached ...>
What other steps or troubleshooting methods can I use to identify the source of this Signal 15 issue?