Name resolution from windows pods does not work on 1.18.1

512 views Asked by At

Have following setup below. Flannel is in VXLAN mode. Name resolution does not work from Windows nodes. Verified that following works:

  1. Windows POD -> external DNS server resolution
  2. Windows POD -> HTTPS connection to kubernetes API service IP
  3. Linux POD on master -> Name resolution against DNS service

Following does not work:

  1. Windows POD -> DNS query against DNS service
  2. Windows POD -> DNS query against IP of DNS pod
gregory@master1:~$ k get nodes
NAME         STATUS   ROLES    AGE    VERSION
master1      Ready    master   22h    v1.18.1
winworker1   Ready    <none>   15h    v1.18.1
winworker2   Ready    <none>   169m   v1.18.1

DNS repro

PS C:\> Test-NetConnection 10.96.0.10 -port 53
WARNING: TCP connect to (10.96.0.10 : 53) failed
ComputerName           : 10.96.0.10
RemoteAddress          : 10.96.0.10
RemotePort             : 53
InterfaceAlias         : vEthernet (62a92abe4497c380bae9dfdee71ae5069cd0bd1b66208f58016345b7a6d9fabe_flannel.4096)
SourceAddress          : 10.244.1.4
PingSucceeded          : False
PingReplyDetails (RTT) : 0 ms
TcpTestSucceeded       : False
PS C:\> Test-NetConnection 10.96.0.1 -port 443
ComputerName     : 10.96.0.1
RemoteAddress    : 10.96.0.1
RemotePort       : 443
InterfaceAlias   : vEthernet (62a92abe4497c380bae9dfdee71ae5069cd0bd1b66208f58016345b7a6d9fabe_flannel.4096)
SourceAddress    : 10.244.1.4
TcpTestSucceeded : True
PS C:\> Resolve-dnsname www.google.com -server 8.8.8.8
Name                                           Type   TTL   Section    IPAddress
----                                           ----   ---   -------    ---------
www.google.com                                 AAAA   299   Answer     2607:f8b0:4004:811::2004
www.google.com                                 A      299   Answer     172.217.15.100
PS C:\> Resolve-dnsname www.google.com -server 10.96.0.10
Resolve-dnsname : www.google.com : This operation returned because the timeout period expired
1

There are 1 answers

0
Gregory Suvalian On

FYI. Kubernetes 1.18.1 has a bug for windows nodes which fail to create network called Host on reboot. (https://github.com/kubernetes-sigs/sig-windows-tools/issues/52). As a result communication is broken within flannel even if you recreate network manually with docker network create -d nat host. To make DNS resolution to work again you also need to restart Rancher wins service get-service rancher-wins | Restart-Service Complete solution untill this is fixed is to modify StartKubelet.ps1 file and add following to it on line 3

$netId = docker network ls -f name=host --format "{{ .ID }}"
if ($netId.Length -lt 1) {
    docker network create -d nat host
}