I have setup a test environment with docker UCP , after some days , one of the controller randomly went down with message in UCP that the host is down and the cluster is not healthy.
Logs of controller container:
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:03:10Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:03:19Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:03:19Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:03:19Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:03:56Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:03:56Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:03:56Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:04:15Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:04:15Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:04:15Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:04:32Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:04:32Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:04:32Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:05:07Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:05:07Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:05:07Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:05:43Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:05:43Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:05:43Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:05:51Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:05:51Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:05:51Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context
Can be a random network connectivity problem? but it should have been restored automatically ?
After examining the docker daemon on the host , I saw that the system went into this issue:
https://github.com/docker/for-linux/issues/162