I can't get Ingress to work on GKE, owing to health check failures. I've tried all of the debugging steps I can think of, including:
- Verified I'm not running low on any quotas
- Verified that my service is accessible from within the cluster
- Verified that my service works behind a k8s/GKE Load Balancer.
- Verified that
healthz
checks are passing in Stackdriver logs
... I'd love any advice about how to debug or fix. Details below!
I have set up a service with type LoadBalancer
on GKE. Works great via external IP:
apiVersion: v1
kind: Service
metadata:
name: echoserver
namespace: es
spec:
ports:
- port: 80
targetPort: 8080
protocol: TCP
type: LoadBalancer
selector:
app: echoserver
Then I try setting up an Ingress on top of this same service:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: echoserver-ingress
namespace: es
annotations:
kubernetes.io/ingress.class: "gce"
kubernetes.io/ingress.global-static-ip-name: "echoserver-global-ip"
spec:
backend:
serviceName: echoserver
servicePort: 80
The Ingress gets created, but it thinks the backend nodes are unhealthy:
$ kubectl --namespace es describe ingress echoserver-ingress | grep backends
backends: {"k8s-be-31102--<snipped>":"UNHEALTHY"}
Inspecting the state of the Ingress backend in the GKE web console, I see the same thing:
The health check details appear as expected:
... and from within a pod in my cluster I can call the service successfully:
# curl -vvv echoserver 2>&1 | grep "< HTTP"
< HTTP/1.0 200 OK
# curl -vvv echoserver/healthz 2>&1 | grep "< HTTP"
< HTTP/1.0 200 OK
And I can address the service by NodePort:
# curl -vvv 10.0.1.1:31102 2>&1 | grep "< HTTP"
< HTTP/1.0 200 OK
(Which goes without saying, because the Load Balancer service I set up in step 1 resulted in a web site that's working just fine.)
I also see healthz
checks passing in Stackdriver logs:
Regarding quotas, I check and see I'm only using 3 of 30 backend services:
$ gcloud compute project-info describe | grep -A 1 -B 1 BACKEND_SERVICES
- limit: 30.0
metric: BACKEND_SERVICES
usage: 3.0
You have configured the timeout value to be 1 second. Perhaps increasing it to 5 seconds will solve the issue.