The environment is Ubuntu20.04.5 with a Tesla P4, CUDA11.4 with Nvidia driver 470.223.02, the rapids image is based on rapidsai/base:23.10-cuda11.2-py3.10. I write a Kserve transformer service and deploy it with k8s, the given cpu is 4, memory is 4G and 1 Tesla P4. There is an error when I call the api but success when I exec the container to debug.
My dockerfile is:
FROM rapidsai/base:23.10-cuda11.2-py3.10
USER root
RUN apt-get install -y --no-install-recommends libsasl2-dev libsasl2-modules gcc g++
RUN pip install kserve==0.10.0 \
sasl==0.3.1 thrift==0.16.0 thrift-sasl==0.4.3 \
pyhive==0.7.0 sqlalchemy==2.0.23 redis==5.0.1 pymysql==1.1.0 statsmodels \
-i https://mirrors.aliyun.com/pypi/simple
RUN pip install httpx==0.25.1 protobuf==4.23.4 fastapi==0.88.0
The possible conflict packages are:
The cudf
origin fastapi
version is 0.104.1, but I use 0.88.0 to suit the kserve
.
The kserve
origin protobuf
is 3.19.0, but I use 4.23.4 to suit the origin cudf
.
The server code is like:
import kserve
transformer = DriverTransformer() # which inherit the kserve.Model
server = kserve.ModelServer()
server.start(models=[transformer])
The error code is:
class DriverTransformer(...):
def inputs2df(inputs: Dict):
...
A_list = [...]
B_list = [...]
df = cudf.DataFrame(A_list + B_list, columns=["C", "D", "E", "F"])
...
gives me:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/cudf/core/column/column.py", line 2337, in as_column
memoryview(arbitrary), dtype=dtype, nan_as_null=nan_as_null
TypeError: memoryview: a bytes-like object is required, not 'tuple'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 373, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
return await self.app(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/fastapi/applications.py", line 270, in __call__
await super().__call__(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/applications.py", line 124, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/opt/conda/lib/python3.10/site-packages/timing_asgi/middleware.py", line 70, in __call__
await self.app(scope, receive, send_wrapper)
File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/opt/conda/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
raise e
File "/opt/conda/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 706, in __call__
await route.handle(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 235, in app
raw_response = await run_endpoint_function(
File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 161, in run_endpoint_function
return await dependant.call(**values)
File "/opt/conda/lib/python3.10/site-packages/kserve/protocol/rest/v1_endpoints.py", line 106, in statistics
response, response_headers = await self.dataplane.statistics(model_name=model_name, body=body, headers=headers)
File "/opt/conda/lib/python3.10/site-packages/kserve/protocol/dataplane.py", line 333, in statistics
response = await model(body, model_type=ModelType.STATISTICIAN)
File "/opt/conda/lib/python3.10/site-packages/kserve/model.py", line 118, in __call__
payload = await self.stat_preprocess(body, headers) if inspect.iscoroutinefunction(self.stat_preprocess) \
File "/ims/transformer/driver_transformer.py", line 1023, in stat_preprocess
input_df = self.inputs2df(inputs)
File "/ims/transformer/dataprocess_gpu/utils.py", line 12, in wrapper
result = func(*args, **kwargs)
File "/ims/transformer/driver_transformer.py", line 352, in inputs2df
df = cudf.DataFrame(
File "/opt/conda/lib/python3.10/site-packages/nvtx/nvtx.py", line 115, in inner
result = func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/cudf/core/dataframe.py", line 814, in __init__
self._init_from_list_like(
File "/opt/conda/lib/python3.10/site-packages/nvtx/nvtx.py", line 115, in inner
result = func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/cudf/core/dataframe.py", line 987, in _init_from_list_like
self._data[col_name] = column.as_column(col)
File "/opt/conda/lib/python3.10/site-packages/cudf/core/column/column.py", line 2523, in as_column
data = as_column(
File "/opt/conda/lib/python3.10/site-packages/cudf/core/column/column.py", line 2009, in as_column
col = ColumnBase.from_arrow(arbitrary)
File "/opt/conda/lib/python3.10/site-packages/cudf/core/column/column.py", line 379, in from_arrow
result = libcudf.interop.from_arrow(data)[0]
File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "interop.pyx", line 199, in cudf._lib.interop.from_arrow
RuntimeError: Fatal CUDA error encountered at: /opt/conda/conda-bld/work/cpp/src/bitmask/null_mask.cu:93: 3 cudaErrorInitializationError initialization error
I don't know how the "tuple" appears and how to solve it.
And when I run the images docker run -it --rm --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 rapidsai/base:23.10-cuda11.2-py3.10 /bin/bash
or exec the above k8s container to use python to debug, it is successful.
How can I solve the problem. Thanks in advance for your help.