We have multiple headless service running in our Azure's AKS VMAS cluster. Sometimes (randomly), we have observed that the coredns fails to resolve the headless services with the following error logs:
E0909 09:31:22.241120 1 runtime.go:73] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
Please note that, while facing the above mentioned issues, the non-headless service(services which have cluster IPs), gets resolved properly without any hassle.
For resolving the issue in the dev/svt environment, we terminate the coredns pod in kube-system namespace, and everything starts working fine again, for brief period of time - 1/2 days.
This deletion operation cannot be performed in the customer deployment scenario.
We raised a ticket with the AKS team, but since coredns is a third-party project, it doesn't come under Azure's support domain.
Has anyone faced this issue with coredns? What is the permanent solution for this issue ?
Maybe it will help someone https://github.com/coredns/coredns/issues/4022 This is a known defect in CoreDNS you need to upgrade CoreDNS inside AKS to use a newer version with the fix applied 1.7.0