Seldon-core deployment in GKE private cluster with Anthos Service Mesh

155 views Asked by At

I'm trying to use GKE private cluster with standard config, with the Anthos service mesh managed profile. However, when I try to deploy "Iris" model for the test, the deployment stuck in calling "storage.googleapis.com":

$ kubectl get all -n test
NAME                                                  READY   STATUS     RESTARTS   AGE
pod/iris-model-default-0-classifier-dfb586df4-ltt29   0/3     Init:1/2   0          30s

NAME                                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/iris-model-default              ClusterIP   xxx.xxx.65.194   <none>        8000/TCP,5001/TCP   30s
service/iris-model-default-classifier   ClusterIP   xxx.xxx.79.206   <none>        9000/TCP,9500/TCP   30s

NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/iris-model-default-0-classifier   0/1     1            0           31s

NAME                                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/iris-model-default-0-classifier-dfb586df4   1         1         0       31s
$ kubectl logs -f -n test pod/iris-model-default-0-classifier-dfb586df4-ltt29 -c classifier-model-initializer
2022/11/19 20:59:34 NOTICE: Config file "/.rclone.conf" not found - using defaults
2022/11/19 20:59:57 ERROR : GCS bucket seldon-models path v1.15.0-dev/sklearn/iris: error reading source root directory: Get "https://storage.googleapis.com/storage/v1/b/seldon-models/o?alt=json&delimiter=%2F&maxResults=1000&prefix=v1.15.0-dev%2Fsklearn%2Firis%2F&prettyPrint=false": dial tcp 199.36.153.8:443: connect: connection refused
2022/11/19 20:59:57 ERROR : Attempt 1/3 failed with 1 errors and: Get "https://storage.googleapis.com/storage/v1/b/seldon-models/o?alt=json&delimiter=%2F&maxResults=1000&prefix=v1.15.0-dev%2Fsklearn%2Firis%2F&prettyPrint=false": dial tcp 199.36.153.8:443: connect: connection refused
2022/11/19 21:00:17 ERROR : GCS bucket seldon-models path v1.15.0-dev/sklearn/iris: error reading source root directory: Get "https://storage.googleapis.com/storage/v1/b/seldon-models/o?alt=json&delimiter=%2F&maxResults=1000&prefix=v1.15.0-dev%2Fsklearn%2Firis%2F&prettyPrint=false": dial tcp 199.36.153.8:443: connect: connection refused
2022/11/19 21:00:17 ERROR : Attempt 2/3 failed with 1 errors and: Get "https://storage.googleapis.com/storage/v1/b/seldon-models/o?alt=json&delimiter=%2F&maxResults=1000&prefix=v1.15.0-dev%2Fsklearn%2Firis%2F&prettyPrint=false": dial tcp 199.36.153.8:443: connect: connection refused

I used "sidecar injection" with the namespace labeling:

kubectl create namespace test
kubectl label namespace test istio-injection- istio.io/rev=asm-managed --overwrite
kubectl annotate --overwrite namespace test mesh.cloud.google.com/proxy='{"managed":"true"}'

When I don't use "sidecar injection", the deployment was quite successful. But in this case I need to inject the proxy manually to get the accesss to the model API. I wonder if this is the intended operation or not.

1

There are 1 answers

3
Adrian Gonzalez-Martin On

Istio sidecars will block connectivity on other init containers. This is a known issue with Istio sidecars unfortunately. A potential workaround is to ask Istio to don't "filter" traffic going to storage.googleapis.com (i.e. don't route that traffic through Istio's egress), which can be done through Istio's excludeIPRanges flag.

In the longer term, due to these shortcomings, Istio seems to be moving away from sidecars into their new "Ambient mesh".