I got stuck at 404 during the tutorial on deploying kserve models using the sklearn-iris example

1.2k views Asked by At

I added ‘+ New Endpoint’ to Endpoints in the kubeflow dashboard and registered the resources below.

kind: "InferenceService"
metadata:
  annotations:
    isdecar.istio.is/inject: "false"
  name: "sklearn-iris"
spec:
  predictor:
    sklearn:
      image: "kserve/sklearnserver:v0.10.0"
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"

isdecar.istio.is/inject: "false": Set not to use Istio Sidecar

and I verified that the status is enabled via the Dashboard UI.

  • URL external: http://sklearn-iris.pipeline.svc.cluster.local
  • URL internal: http://sklearn-iris.pipeline.svc.cluster.local/v1/models/sklearn-iris:predict
$ kubectl get InferenceService sklearn-iris -n namespace
NAME           URL                                              READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                    AGE
sklearn-iris   http://sklearn-iris.pipeline.svc.cluster.local   True           100                              sklearn-iris-predictor-default-00001   24h

Below is the python code for using my sklearn-iris example.

sklear_iris_input = dict(instances = [
[6.8, 2.8, 4.8, 1.4],
[6.0, 3.4, 4.5, 1.6]
])


import requests
import kfp

HOST = "http://127.0.0.1:8080/"   # local host
# When using 'https', error 
# `HTTPSConnectionPool(host='127.0.0.1', port=8080): 
# Max retries exceeded with url: /v1/models/v1/models/sklearn-iris:predict (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1131)')))`
# occurs in session.get(HOST, verify=False)

session = requests.Session()
response = session.get(HOST, verify=False)        

USERNAME = "[email protected]"
PASSWORD = "password"


headers = {
"Content-Type" : "application/x-www-form-urlencoded",       # using 'form data'
}

data = {'login': USERNAME, "password": PASSWORD}
session.post(response.url, headers = headers, data=data)
session_cookie = session.cookies.get_dict()                     

import json
headers = {'Host': 'sklearn-iris.pipeline.svc.cluster.local'}
res = session.post(f"{HOST}v1/models/sklearn-iris:predict", 
                    headers = headers,
                    cookies = session_cookie,
                    data = json.dumps(sklear_iris_input))
print(res.json)     

and I got <bound method Response.json of <Response [404]>>.

I also tried with curl.

curl -v -H "Host: sklearn-iris.pipeline.svc.cluster.local" \
   -d '{"instances": [[5.1, 3.5, 1.4, 0.2], [5.9, 3.0, 5.1, 1.8]]}' \
   http://sklearn-iris-python2.pipeline.svc.cluster.local:80/v1/models/sklearn-iris-python2:predict

and I got this

* Could not resolve host: sklearn-iris.pipeline.svc.cluster.local
* Closing connection 0
curl: (6) Could not resolve host: sklearn-iris.pipeline.svc.cluster.local

why ..?

List of things I did to solve this problem

1. Fixed kserve's ingressgateway config map assuming it is being deployed with the wrong path entered.

$ kubectl edit configmaps -n kserve inferenceservice-config

original:

...
ingress: |-
{
    "ingressGateway" : "knative-serving/knative-ingress-gateway",          
...

fixed to this

...
ingress: |-
{
    "ingressGateway" : "kubeflow/kubeflow-gateway",          
...

But problem not resolved

2. Check if the DNS server is not working properly

$ kubectl run --rm -it busybox --image=busybox:1.28 --restart=Never -- nslookup sklearn-iris.pipeline.svc.cluster.local
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      sklearn-iris.pipeline.svc.cluster.local
Address 1: 10.108.6.239 knative-local-gateway.istio-system.svc.cluster.local
pod "busybox" deleted

DNS server(10.96.0.10) is working

But problem not resolved

3. Check if the domain name is correct or not entered incorrectly.

domain name is correctly

4. Check if the network is connected

network is connected

What can I do to solve this problem?

Please help me!

versions

  • Ubuntu: 20.04
  • kubernetes: 1.25.00
  • kubeflow: 1.7.0
  • kserve: 0.9.0, 0.10.0 (I tried in both cases.)
1

There are 1 answers

1
John Sarle On

This seems to be very similar to a problem I was having here (github issue link).

I am going to assume you are using knative version >= 1.8, because it would be in line with the other versions you have posted.

Based on that here is the exact issue as posted by Dan Sun, Co-founder of Kserve. Again I have posted his exact response below for transparency.

Since you are using Knative 1.8, there is a change that it defaults the domain to svc.cluster.local which is not exposed on ingress, if you want to run the curl command outside of the kube cluster you would need to configure the external domain. see https://kserve.github.io/website/0.10/admin/serverless/serverless/#1-install-knative-serving and https://github.com/kubeflow/manifests/tree/master/contrib/kserve#steps

Here is the link to my comment where I figured it out and all of the exact steps I took in order to solve it. For transparency, I also posted the exact post below.

Update I tried starting over from scratch and was able to get everything to work. Turns out I was following the last set of directions incorrectly. All of the versions are the same as my original post. Thank you @yuzisun for all of your help. I really appreciate it.

Steps I took

  1. Fresh kind cluster creation

    kind create cluster --config=kind-local-pv.yml
    

    kind-local-pv.yml:

    apiVersion: kind.x-k8s.io/v1alpha4
    kind: Cluster
    nodes:
    - role: control-plane
      image: kindest/node:v1.24.0@sha256:406fd86d48eaf4c04c7280cd1d2ca1d61e7d0d61ddef0125cb097bc7b82ed6a1
      # Mount Path Defaults
      extraMounts:
        - hostPath: /mnt
          containerPath: /mnt
        - hostPath: /home/$USER/git
          containerPath: /home/$USER/git
    
  2. Fresh Kubeflow install using the instructions here:

    while ! kustomize build example | awk '!/well-defined/' | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
    
  3. Followed steps 1,2 and 3 located here

    "1". Create test namespace

    kubectl create ns kserve-test
    

    "2". Configure domain name

    kubectl patch cm config-domain --patch '{"data":{"example.com":""}}' -n knative-serving
    

    "3". Port forward

    INGRESS_GATEWAY_SERVICE=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}')
    kubectl port-forward --namespace istio-system svc/${INGRESS_GATEWAY_SERVICE} 8080:80
    
  4. Followed steps 2, 3 and 5 located here

    "2". Create an InferenceService

    kubectl apply -n kserve-test -f - <<EOF
    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "sklearn-iris"
    spec:
      predictor:
        model:
          modelFormat:
            name: sklearn
          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
    EOF
    

    "3". Check InferenceService status. Wait until READY equals True

    kubectl get inferenceservices sklearn-iris -n kserve-test
    

    "5". Step 5 but only data creation aspect

    cat <<EOF > "./iris-input.json"
    {
      "instances": [
        [6.8,  2.8,  4.8,  1.4],
        [6.0,  3.4,  4.5,  1.6]
      ]
    }
    EOF
    
  5. Followed the instructions located here: The only difference between what is linked and what is below is CLUSTER_IP=localhost:8080, and SERVICE_HOSTNAME uses -n kserve-test instead of -n admin. Additionally, I got the SESSION variable by inspecting my Kubeflow Central Dashboard page in order see the cookie that was used, then copied that into an env variable.

MODEL_NAME=sklearn-iris
INPUT_PATH=@./iris-input.json
CLUSTER_IP=localhost:8080
SERVICE_HOSTNAME=$(kubectl get -n kserve-test inferenceservice ${MODEL_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Cookie: authservice_session=${SESSION}" http://${CLUSTER_IP}/v1/models/${MODEL_NAME}:predict -d ${INPUT_PATH}