kserve updating from 0.7 to 0.9. My .mar file works on 0.7 but not on 0.9. Was able to run the example without issue on 0.9

236 views Asked by At

I have been tasked with updating kserve from 0.7 to 0.9. Our company mar files run fine on 0.7 but when I update to kserve 0.9 the pods are brought up without issue. However, when I when a request is sent it returns a 500 error. The logs are given below.

model being used is: pytorch Deployment type: RawDeployment Kubernetes version: 1.25

Defaulted container "kserve-container" out of: kserve-container, storage-initializer (init)
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2022-11-18T13:37:44,001 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2022-11-18T13:37:44,203 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.6.0
TS Home: /usr/local/lib/python3.8/dist-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 494 M
Python executable: /usr/bin/python
Config file: /mnt/models/config/config.properties
Inference address: http://0.0.0.0:8085
Management address: http://0.0.0.0:8085
Metrics address: http://0.0.0.0:8082
Model Store: /mnt/models/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 4
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: true
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /mnt/models/model-store
Model config: N/A
2022-11-18T13:37:44,208 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2022-11-18T13:37:44,288 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring 
2022-11-18T13:37:44,297 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot startup.cfg
2022-11-18T13:37:44,298 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot startup.cfg validated successfully
[I 221118 13:37:46 __main__:75] Wrapper : Model names ['modelname'], inference address http//0.0.0.0:8085, management address http://0.0.0.0:8085, model store /mnt/models/model-store
[I 221118 13:37:46 TorchserveModel:54] kfmodel Predict URL set to 0.0.0.0:8085
[I 221118 13:37:46 TorchserveModel:56] kfmodel Explain URL set to 0.0.0.0:8085
[I 221118 13:37:46 TSModelRepository:30] TSModelRepo is initialized
[I 221118 13:37:46 model_server:150] Registering model: modelname
[I 221118 13:37:46 model_server:123] Listening on port 8080
[I 221118 13:37:46 model_server:125] Will fork 1 workers
[I 221118 13:37:46 model_server:128] Setting max asyncio worker threads as 12
2022-11-18T13:37:54,738 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model modelname
2022-11-18T13:37:54,738 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model modelname
[I 221118 13:40:12 TorchserveModel:78] PREDICTOR_HOST : 0.0.0.0:8085
[E 221118 13:40:12 web:1789] Uncaught exception POST /v1/models/modelname:predict (127.0.0.1)
    HTTPServerRequest(protocol='http', host='localhost:5000', method='POST', uri='/v1/models/modelname:predict', version='HTTP/1.1', remote_ip='127.0.0.1')
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/tornado/web.py", line 1704, in _execute
        result = await result
      File "/usr/local/lib/python3.8/dist-packages/kserve/handlers/predict.py", line 70, in post
        response = await model(body)
      File "/usr/local/lib/python3.8/dist-packages/kserve/model.py", line 86, in __call__
        response = (await self.predict(request)) if inspect.iscoroutinefunction(self.predict) \
      File "/home/model-server/kserve_wrapper/TorchserveModel.py", line 80, in predict
        response = await self._http_client.fetch(
    ConnectionRefusedError: [Errno 111] Connection refused
[E 221118 13:40:12 web:2239] 500 POST /v1/models/modelname:predict (127.0.0.1) 9.66ms
[I 221118 13:40:13 TorchserveModel:78] PREDICTOR_HOST : 0.0.0.0:8085
[E 221118 13:40:13 web:1789] Uncaught exception POST /v1/models/modelname:predict (127.0.0.1)
    HTTPServerRequest(protocol='http', host='localhost:5000', method='POST', uri='/v1/models/modelname:predict', version='HTTP/1.1', remote_ip='127.0.0.1')
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/tornado/web.py", line 1704, in _execute
        result = await result
      File "/usr/local/lib/python3.8/dist-packages/kserve/handlers/predict.py", line 70, in post
        response = await model(body)
      File "/usr/local/lib/python3.8/dist-packages/kserve/model.py", line 86, in __call__
        response = (await self.predict(request)) if inspect.iscoroutinefunction(self.predict) \
      File "/home/model-server/kserve_wrapper/TorchserveModel.py", line 80, in predict
        response = await self._http_client.fetch(
    ConnectionRefusedError: [Errno 111] Connection refused
[E 221118 13:40:13 web:2239] 500 POST /v1/models/modelname:predict (127.0.0.1) 3.31ms
[I 221118 13:40:14 TorchserveModel:78] PREDICTOR_HOST : 0.0.0.0:8085
[E 221118 13:40:14 web:1789] Uncaught exception POST /v1/models/modelname:predict (127.0.0.1)
    HTTPServerRequest(protocol='http', host='localhost:5000', method='POST', uri='/v1/models/modelname:predict', version='HTTP/1.1', remote_ip='127.0.0.1')
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/tornado/web.py", line 1704, in _execute
        result = await result
      File "/usr/local/lib/python3.8/dist-packages/kserve/handlers/predict.py", line 70, in post
        response = await model(body)
      File "/usr/local/lib/python3.8/dist-packages/kserve/model.py", line 86, in __call__
        response = (await self.predict(request)) if inspect.iscoroutinefunction(self.predict) \
      File "/home/model-server/kserve_wrapper/TorchserveModel.py", line 80, in predict
        response = await self._http_client.fetch(
    ConnectionRefusedError: [Errno 111] Connection refused
[E 221118 13:40:14 web:2239] 500 POST /v1/models/modelname:predict (127.0.0.1) 3.38ms

I was not able to find the package (/usr/local/lib/python3.8/dist-packages/tornado/web.py) tornado inside the mar file. So I don't think it is being used directly by the model.

I tried deploying it on both kserver 0.7 and 0.9. our mar file works on kserve 0.7 but fails on kserve 0.9. I also deployed the sample inference (https://kserve.github.io/website/0.9/modelserving/v1beta1/torchserve/#create-the-torchserve-inferenceservice) on kserve 0.9 and it worked as expected.

deployed it on GKE, rke2 and Docker Desktop Kubernetes.

0

There are 0 answers