Kubernetes cluster on GCE connection refused error

94 views Asked by At

I am trying to create a k8s cluster using Google Compute Engine, Terraform, and Ansible. I created three vms through terraform and installed docker and kubernetes on them through Ansible. I want to use Calico as a network add-on. I received connection refused error on 6443 port everytime. After some debugging I found the problem in this part.

 - name: Kubeadmin init - only master
  shell: |
    kubeadm init --service-cidr 10.96.0.0/12 --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address 0.0.0.0 
  when: 
    - ansible_facts['hostname'] == 'master'

- name: Copy kubeconfig
  shell: |
    mkdir -p /root/.kube
    cp  /etc/kubernetes/admin.conf /root/.kube/config
    chown $(id -u):$(id -g) /root/.kube/config
    kubeadm token create --print-join-command > /tmp/.token
  when: 
    - ansible_facts['hostname'] == 'master'

After I connected to the vm through google cloud platform, and runned kubectl get nodes it gives me connection refused error. Then I runned this part again with my own user on google cloud platform.

mkdir -p $HOME/.kube
sudo cp  /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Then try again kubectl get nodes it gives me the node. So what should i do to solve this on ansible I cannot add a calico addon because of this error I think problem causes from the users(root, my own user)?

1

There are 1 answers

2
Sai Chandini Routhu On

When using kubectl get nodes, the connection refused error you are experiencing is probably caused by the way your Ansible playbook handles the kubeconfig file.

See how to resolve the problem and make calico deployment possible here.

  • Make sure the master node's Kubernetes API server is up and working by checking its port 6443 This can be accomplished by confirming that no firewall rules are preventing the connection and monitoring the kube-apiserver service's status.
  • Due to security considerations, it is generally discouraged to run Ansible jobs with root capabilities. Instead, create a special user for Kubernetes management and configure Ansible to utilize that user.
  • Inside the playbook once the kubeconfig has been copied to a non-root user's home directory.Ensure that the user has the necessary permissions, and utilize the file module's owner and group options to properly specify ownership and group.
  • Change the playbook's kubectl get nodes task to point to the non-root user's kubeconfig directory.
  • Using the official calico for Kubernetes CNI provider on Ansible Galaxy, you can deploy calico once you have a functional kubeconfig established for your non-root user.

Instead of pasting the kubeconfig directly into the playbook of enhanced security advantage, think about storing it as a Kubernetes secret. Using Ansible Vault to safely handle private data, such as Kubeadm tokens

Refer to this blog by Austin for more information

EDIT1:

Here some additional troubleshooting steps below to address the connection refused error at port 6443 with a non root user and assuming firewall rules allow the port :

  1. Verify again the kubeadm init command that Ansible is executing on the master node.Make sure that 0.0.0.0 is the value of the apiserver-advertise address flag. This gives the API server the command to listen on all interfaces, enabling worker nodes to connect using any IP address.

  2. Verify that there is adequate network connectivity between the worker and master nodes. Ping or SSH from the worker node to the master IP address can be used to test this.

  3. Check that worker nodes are able to appropriately resolve the hostname of the master node.To verify successful resolution, you can run hostname-f on the master node to obtain its fully qualified domain name (FQDN), and then ping on a worker node.

  4. Although your statement that firewall rules permit port 6443, please confirm that worker nodes do not have any additional rules preventing inbound traffic on that port.

  5. Even though Calico may not be the main cause of the error, make sure it is deployed and configured correctly using Kubeadm init after the first cluster setup.

  6. Check the Ansible output from the Kubeadm init and Kubeconfig copy jobs for any issues or warnings.

  7. Check the master node's Kubernetes API server logs for any connection or authentication-related issues.

  8. To verify the validity of the created join token you are using for worker nodes, utilize the kubeadm token list on the master node.

  9. For testing purposes, you might want to manually execute the kubeadm init command on the master node to identify any problems with Ansible execution.

Refer to official kubernetes doc on Initializing your control-plane node for more details.