I have this very weird thing happening on a 1.26.5 cluster bootstrapped with Kubespray (one master, one node only). The cluster is using CoreDNS with a nodelocaldns setup automatically with Kubespray too.
The first thing I saw was that one of my Certificates was not automatically generated with cert-manager.
At first I thought it was hairpin nat, but running a few tests discarded this theory.
I proceeded to run a few queries using a throwable pod and I got this (I replaced the real domain and the real IP by fake ones) :
❯ kubectl run --rm -it busybox-2 --image=arunvelsriram/utils:latest
If you don't see a command prompt, try pressing enter.
utils@busybox-2:~$ nslookup admin-test-info.my.domain.com
Server: 169.254.25.10
Address: 169.254.25.10#53
Non-authoritative answer:
admin-test-info.my.domain.com canonical name = my.domain.com.
Name: my.domain.com
Address: 50.50.50.50
utils@busybox-2:~$ curl -v http://admin-test-info.my.domain.com
* Rebuilt URL to: http://admin-test-info.my.domain.com/
^C
This curl request hangs there, with no answer from the DNS server. I then proceeded to try a curl request directly on the canonical name, and I got this :
utils@busybox-2:~$ nslookup my.domain.com
Server: 169.254.25.10
Address: 169.254.25.10#53
Non-authoritative answer:
Name: my.domain.com
Address: 50.50.50.50
utils@busybox-2:~$ curl -v http://my.domain.com
* Rebuilt URL to: http://my.domain.com/
* Trying 50.50.50.50...
* TCP_NODELAY set
* Connected to my.domain.com (50.50.50.50) port 80 (#0)
> GET / HTTP/1.1
> Host: my.domain.com
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Mon, 04 Dec 2023 15:26:45 GMT
< Content-Length: 19
<
404 page not found
* Connection #0 to host my.domain.com left intact
404 is normal here, since there's no Ingress routing for this particular host.
I already modified the nodelocaldns configuration to make the domain admin-test-info.my.domain.com
seem like an A
entry pointing to a local IP, and I got rid of my issue, but still, i need to understand what's going on here. Any clue ?
It might be that the configuration on the DNS was causing the issue or the network setting. You can try this troubleshooting steps to narrow what may have caused the issue"
Check again with your DNS configuration in your /etc/resolv.conf/ file if it was correct.
Ensure your network settings such as IP address and mask are correct.
Inspect your logs, especially those with warnings or related to
admin-test-info.my.domain.com
You can try to deploy another pod on a different node to check if the issue persists.