I have created a tensorflow
object detection model and served it using tensorflow
serving. I have created a python client to test the serving model and that takes around 40ms time to receive all the prediction.
t1 = datetime.datetime.now()
result = stub.Predict(request, 60.0) # 60 secs timeout
t2 = datetime.datetime.now()
print ((t2 - t1).microseconds / 1000)
Now, my problem is when I do the same on java
, it takes way too much time (about 10 times) of 450 to 500ms.
ManagedChannel channel = ManagedChannelBuilder.forAddress("localhost", 9000)
.usePlaintext(true)
.build();
PredictionServiceGrpc.PredictionServiceBlockingStub stub = PredictionServiceGrpc.newBlockingStub(channel);
......
Instant pre = Instant.now();
Predict.PredictResponse response = stub.predict(request);
Instant curr = Instant.now();
System.out.println("time " + ChronoUnit.MILLIS.between(pre,curr));
The actual issue was that,
I was sending all the image pixels over the network (which was a bad idea). Now, changing the input to an encoded image made it faster.