seldon-code tfserver grpc requests returns `StatusCode.UNIMPLEMENTED`

170 views Asked by At

I have a tensorflow model .pb with the spec as below

The given SavedModel SignatureDef contains the following input(s):
  inputs['Conv1_input'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 28, 28, 1)
      name: serving_default_Conv1_input:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['Dense'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 10)
      name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict

I use the below docker file to build and run the tensorflow-serving

FROM tensorflow/serving

ARG MODEL_PATH

# Define the model base path
ENV MODEL_BASE_PATH=/models

RUN mkdir -p $MODEL_BASE_PATH

# This will copy the model into the models/model dircetory in the container
COPY $MODEL_PATH /models/classifier
ENV MODEL_NAME=classifier

# REST PORT
EXPOSE 8500
# GRPC PORT
EXPOSE 8501

I use the below Seldon manifest to deploy the tf-serving in a locally running colima cluster

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: tfserving
spec:
  annotations:
    seldon.io/executor: "true"
  protocol: tensorflow
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image:tf-serve
          imagePullPolicy: Never
          name: model
          ports:
          - containerPort: 8501
            name: http
            protocol: TCP
          - containerPort: 8500
            name: grpc
            protocol: TCP
    graph:
      name: model
      type: MODEL
      endpoint:
        type: GRPC
        httpPort: 8501
        grpcPort: 8500
    name: template
    replicas: 1

The pods and the services look healthy.

I am able to hit the endpoint and derive prediction after port-forwarding.

kubectl port-forward svc/tfserving-template-model -n seldon-services 8500:8500 and running the code below.

MAX_MESSAGE_LENGTH = 2000000000
REQUEST_TIMEOUT = 90

class TfServing:
    def __init__(
        self, 
        host_port = "localhost:8500"
):
        channel = grpc.insecure_channel(
            host_port,
            options = [
                ("grpc.max_send_message_length", MAX_MESSAGE_LENGTH),
                ("grpc.max_receive_message_length", MAX_MESSAGE_LENGTH)
            ]
        )

        self.stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
        self.req = predict_pb2.PredictRequest()
        self.req.model_spec.name = "classifier"

    def predict(self, image):
        tensor = tf.make_tensor_proto(image)
        self.req.inputs["Conv1_input"].CopyFrom(tensor)
        response = self.stub.Predict(self.req, REQUEST_TIMEOUT)
        output_tensor_proto = response.outputs["Dense"]
        shape  = tf.TensorShape(output_tensor_proto.tensor_shape)
        result = tf.reshape(output_tensor_proto.float_val, shape)
        return result.numpy()

if __name__ == "__main__":
    serving_model = TfServing()
    predictions = serving_model.predict(
        image = np.float32(
            np.uint8(
                np.random.random((1, 28, 28, 1)) * 255
            )
        )
    )

I a unable to use the SeldonClient to achieve the same.

sc = SeldonClient(
    deployment_name="tfserving", 
    namespace="seldon-services",
    gateway_endpoint="localhost:8500",
    grpc_max_send_message_length=20000000,
    grpc_max_receive_message_length=20000000,
)

r = sc.predict(
    gateway="seldon", 
    transport="grpc", 
    payload_type="tftensor",
    names = ["Conv1_input"],
    data=np.float32(
            np.uint8(
                np.random.random((1, 28, 28, 1)) * 255
            )
        ),
    
)

While using the SeldonClient code I receive the StatusCode.UNIMPLEMENTED error

Success:False message:<_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNIMPLEMENTED
        details = ""
        debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:8500 {grpc_message:"", grpc_status:12, created_time:"2022-12-03T16:56:55.474244-06:00"}"
0

There are 0 answers