We have recently upgraded from docker version 17.06.0-ce to 18.09.2 on our deployment environment. Experienced container got killed suddenly after running for few days without much information in docker logs.
Monitored the memory usage, and the affected containers are well below all limits (per container and also the host has enough memory free).
Setup observations during the issue:
- docker version with 18.09.2 with around 30 running containers.
- Experienced container got killed after running for few days.
Docker Logs observed during container crash
Nov 16 15:42:11 site1 containerd[1762]: time="2020-11-16T15:42:11.171040904Z" level=info msg="shim reaped" id=d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d
Nov 16 15:42:11 site1 containerd[1762]: time="2020-11-16T15:42:11.171156262Z" level=warning msg="cleaning up after killed shim" id=d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d namespace=moby
Nov 16 15:42:11 site1 dockerd[3022]: time="2020-11-16T15:42:11.171164295Z" level=warning msg="failed to delete process" container=d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d error="ttrpc: client shutting down: ttrpc: closed: unknown" module=libcontainerd namespace=moby process=b0d77b1ebf2c82b09c152530a5e24491d76e216b852e385686c46128c94e7f5a
Nov 16 15:42:11 site1 c73920e3476c[3022]: INFO: 2020/11/16 15:42:11.396872 [nameserver a6:0c:6a:18:69:1f] container d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d died; tombstoning entry test-endpoint-s104.weave.local. -> 10.44.0.14
Output of Docker version
Client:
Version: 18.09.2
API version: 1.39
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 04:13:50 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.2
API version: 1.39 (minimum version 1.12)
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 03:42:13 2019
OS/Arch: linux/amd64
Experimental: false
Output of Docker Info:
Containers: 30
Running: 25
Paused: 0
Stopped: 5
Images: 236
Server Version: 18.09.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: journald
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-171-generic
Operating System: Ubuntu 16.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 62.92GiB
Name: fpas-site1-dra-director-a
ID: KKSM:3YNF:LE7N:NVFE:Y5C4:C6CN:LAQT:QRRZ:VYQS:O4PP:VQKG:DXTK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
com.broadhop.swarm.uuid=uuid4:d96aef99-b5fc-44e3-b7fa-65b08b7e30f3
com.broadhop.swarm.role=endpoint-role
com.broadhop.swarm.node=
com.broadhop.swarm.hostname=site1
com.broadhop.swarm.mode=
com.broadhop.network.interfaces=internal:172.26.50.13
Experimental: false
Insecure Registries:
registry:5000
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: API is accessible on http://127.0.0.1:2375 without encryption.
Access to the remote API is equivalent to root access on the host. Refer
to the 'Docker daemon attack surface' section in the documentation for
more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
WARNING: No swap limit support
NOTE: Since this deployment is on critical infrastructure and that we want to understand why this happened and ascertain that this does not occur again. Did anyone faced same kind of issue in any environment and please let us know if there are known issues in with the docker versions being used.
Your go lang version is quite old, you may try to update. I found this issue in the github.
https://github.com/moby/moby/issues/38742