I built a kubernetes cluster, using flannel overlay network. The problem is one of the service ip isn't always accessible.
I tested within the cluster, by telneting the service ip and port, ended in connection timeout. Checked with netstat, the connection was always in "SYN_SENT" state, seemed that peer didn't accept connection. But if I telnet to the pod ip and port that backed the service directly, the connection could be made successfully. It only happened to one of the service, other services are ok.
And if I scaled the backend pod to a larger value, like 2. Then some of requests to the service ip can succeed. It seemed that the service wasn't able to connect to one of the backed pod.
Which component may be the cause of such problem? My service configuration, kube-proxy or flannel?
Check the discussion here: https://github.com/kubernetes/kubernetes/issues/38802
It's required to
sysctl net.bridge.bridge-nf-call-iptables=1
on nodes.