I have a tensorflow model .pb
with the spec as below
The given SavedModel SignatureDef contains the following input(s):
inputs['Conv1_input'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 28, 28, 1)
name: serving_default_Conv1_input:0
The given SavedModel SignatureDef contains the following output(s):
outputs['Dense'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 10)
name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict
I use the below docker file to build and run the tensorflow-serving
FROM tensorflow/serving
ARG MODEL_PATH
# Define the model base path
ENV MODEL_BASE_PATH=/models
RUN mkdir -p $MODEL_BASE_PATH
# This will copy the model into the models/model dircetory in the container
COPY $MODEL_PATH /models/classifier
ENV MODEL_NAME=classifier
# REST PORT
EXPOSE 8500
# GRPC PORT
EXPOSE 8501
I use the below Seldon manifest to deploy the tf-serving in a locally running colima cluster
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: tfserving
spec:
annotations:
seldon.io/executor: "true"
protocol: tensorflow
predictors:
- componentSpecs:
- spec:
containers:
- image:tf-serve
imagePullPolicy: Never
name: model
ports:
- containerPort: 8501
name: http
protocol: TCP
- containerPort: 8500
name: grpc
protocol: TCP
graph:
name: model
type: MODEL
endpoint:
type: GRPC
httpPort: 8501
grpcPort: 8500
name: template
replicas: 1
The pods and the services look healthy.
I am able to hit the endpoint and derive prediction after port-forwarding.
kubectl port-forward svc/tfserving-template-model -n seldon-services 8500:8500
and running the code below.
MAX_MESSAGE_LENGTH = 2000000000
REQUEST_TIMEOUT = 90
class TfServing:
def __init__(
self,
host_port = "localhost:8500"
):
channel = grpc.insecure_channel(
host_port,
options = [
("grpc.max_send_message_length", MAX_MESSAGE_LENGTH),
("grpc.max_receive_message_length", MAX_MESSAGE_LENGTH)
]
)
self.stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
self.req = predict_pb2.PredictRequest()
self.req.model_spec.name = "classifier"
def predict(self, image):
tensor = tf.make_tensor_proto(image)
self.req.inputs["Conv1_input"].CopyFrom(tensor)
response = self.stub.Predict(self.req, REQUEST_TIMEOUT)
output_tensor_proto = response.outputs["Dense"]
shape = tf.TensorShape(output_tensor_proto.tensor_shape)
result = tf.reshape(output_tensor_proto.float_val, shape)
return result.numpy()
if __name__ == "__main__":
serving_model = TfServing()
predictions = serving_model.predict(
image = np.float32(
np.uint8(
np.random.random((1, 28, 28, 1)) * 255
)
)
)
I a unable to use the SeldonClient to achieve the same.
sc = SeldonClient(
deployment_name="tfserving",
namespace="seldon-services",
gateway_endpoint="localhost:8500",
grpc_max_send_message_length=20000000,
grpc_max_receive_message_length=20000000,
)
r = sc.predict(
gateway="seldon",
transport="grpc",
payload_type="tftensor",
names = ["Conv1_input"],
data=np.float32(
np.uint8(
np.random.random((1, 28, 28, 1)) * 255
)
),
)
While using the SeldonClient code I receive the StatusCode.UNIMPLEMENTED error
Success:False message:<_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNIMPLEMENTED
details = ""
debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:8500 {grpc_message:"", grpc_status:12, created_time:"2022-12-03T16:56:55.474244-06:00"}"