Kuberntes master not starting up in OpenStack heat

324 views Asked by At

I have been trying to setup a Kubernetes cluster for the last week or so in OpenStack using this guide. I have faced a few issues in the process one of which is described in this question -> kube-up.sh failes in OpenStack

On issuing the ./cluster/kube-up.sh script, it tries to bring up the cluster using the openstack stack create step (Log) . Here, for some reason the kubernetes master does not properly come up and here is where the installation fails. I was able to SSH into the master node and found this in /var/log/cloud-init-output.log

[..]
Complete!
*  INFO: Running install_centos_stable_post()
*  INFO: Running install_centos_check_services()
*  INFO: Running install_centos_restart_daemons()
*  INFO: Running daemons_running()
*  INFO: Salt installed!
2017-01-02 12:57:31,574 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2017-01-02 12:57:31,576 - util.py[WARNING]: Running scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_scripts_user.pyc'>) failed
Cloud-init v. 0.7.5 finished at Mon, 02 Jan 2017 12:57:31 +0000. Datasource DataSourceOpenStack [net,ver=2].  Up 211.20 seconds

On digging further I found this snippet in the /var/log/messages file -> https://paste.ubuntu.com/23733430/

So I would assume that the Docker daemon is not starting up. Also there is something screwed up with my etcd configuration due to which flanneld service is also not starting up. Here is the output of service flanneld status

● flanneld.service - Flanneld overlay address etcd agent
Loaded: loaded (/usr/lib/systemd/system/flanneld.service; enabled; vendor preset: disabled)
Active: activating (start) since Tue 2017-01-03 13:32:37 UTC; 48s ago
Main PID: 15666 (flanneld)
CGroup: /system.slice/flanneld.service
       └─15666 /usr/bin/flanneld -etcd-endpoints= -etcd-prefix= -iface=eth0 --ip-masq

Jan 03 13:33:16 kubernetesstack-master flanneld[15666]: E0103 13:33:16.229827 15666 network.go:53] Failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Jan 03 13:33:17 kubernetesstack-master flanneld[15666]: E0103 13:33:17.230082 15666 network.go:53] Failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Jan 03 13:33:18 kubernetesstack-master flanneld[15666]: E0103 13:33:18.230326 15666 network.go:53] Failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Jan 03 13:33:19 kubernetesstack-master flanneld[15666]: E0103 13:33:19.230560 15666 network.go:53] Failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Jan 03 13:33:20 kubernetesstack-master flanneld[15666]: E0103 13:33:20.230822 15666 network.go:53] Failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Jan 03 13:33:21 kubernetesstack-master flanneld[15666]: E0103 13:33:21.231325 15666 network.go:53] Failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Jan 03 13:33:22 kubernetesstack-master flanneld[15666]: E0103 13:33:22.231581 15666 network.go:53] Failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Jan 03 13:33:23 kubernetesstack-master flanneld[15666]: E0103 13:33:23.232140 15666 network.go:53] Failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Jan 03 13:33:24 kubernetesstack-master flanneld[15666]: E0103 13:33:24.234041 15666 network.go:53] Failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Jan 03 13:33:25 kubernetesstack-master flanneld[15666]: E0103 13:33:25.234277 15666 network.go:53] Failed to retrieve network config: client: etcd cluster is unavailable or misconfigured

My etcd daemon is running:

[root@kubernetesstack-master salt]# netstat -tanlp | grep etcd
tcp        0      0 192.168.173.3:4379      0.0.0.0:*               LISTEN      20338/etcd
tcp        0      0 192.168.173.3:4380      0.0.0.0:*               LISTEN      20338/etcd

Although its running on a non standard port.

I'm also in a corporate network under a proxy. Any pointers on how to debug this further is appreciated. As of now I have reached a dead end on how to proceed on this. Asking in the kubernetes slack channels have also produced zero results!

1

There are 1 answers

0
mdaniel On

/usr/bin/flanneld -etcd-endpoints=

That line is the source of your troubles, assuming you didn't elide the output before posting it. Your situation is made worse by etcd running on non-standard ports, but thankfully I think the solution to both of those is actually the same fix.

I would expect running systemctl cat flanneld.service (you may need sudo, depending on the strictness of your systemd setup) to output the unified systemd descriptor for flanneld, including any "drop-ins", overrides, etc, and if my theory is correct, one of them will be either Environment= or EnvironmentFile= and that's the place I bet flanneld.service expected to have ETCD_ENDPOINTS= or FLANNELD_ETCD_ENDPOINTS= (as seen here) available to the Exec.

So hopefully that file is either missing or is actually blank, and in either case you are one swift vi away from teaching flanneld about your etcd endpoints, and everything being well in the world again.