I have searched a lot here but unfortunately could not find an answer.
I am running TensorFlow 1.3
(installed via PiP on MacOS) on my local machine, and have created a model using the provided "ssd_mobilenet_v1_coco
" checkpoints.
I managed to train locally and on the ML-Engine (Runtime 1.2), and successfully deployed my savedModel to the ML-Engine.
Local predictions (below code) work fine and I get the model results
gcloud ml-engine local predict --model-dir=... --json-instances=request.json
FILE request.json: {"inputs": [[[242, 240, 239], [242, 240, 239], [242, 240, 239], [242, 240, 239], [242, 240, 23]]]}
However when deploying the model and trying to run on the ML-ENGINE for remote predictions with the code below:
gcloud ml-engine predict --model "testModel" --json-instances request.json(SAME JSON FILE AS BEFORE)
I get this error:
{
"error": "Prediction failed: Exception during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details=\"NodeDef mentions attr 'data_format' not in Op<name=DepthwiseConv2dNative; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_FLOAT, DT_DOUBLE]; attr=strides:list(int); attr=padding:string,allowed=[\"SAME\", \"VALID\"]>; NodeDef: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/depthwise = DepthwiseConv2dNative[T=DT_FLOAT, _output_shapes=[[-1,150,150,32]], data_format=\"NHWC\", padding=\"SAME\", strides=[1, 1, 1, 1], _device=\"/job:localhost/replica:0/task:0/cpu:0\"](FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Relu6, FeatureExtractor/MobilenetV1/Conv2d_1_depthwise/depthwise_weights/read)\n\t [[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/depthwise = DepthwiseConv2dNative[T=DT_FLOAT, _output_shapes=[[-1,150,150,32]], data_format=\"NHWC\", padding=\"SAME\", strides=[1, 1, 1, 1], _device=\"/job:localhost/replica:0/task:0/cpu:0\"](FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Relu6, FeatureExtractor/MobilenetV1/Conv2d_1_depthwise/depthwise_weights/read)]]\")"
}
I saw something similar here: https://github.com/tensorflow/models/issues/1581
About the problem being with the "data-format" parameter. But unfortunately I could not use that solution since I am already on TensorFlow 1.3.
It also seems that it might be a problem with MobilenetV1: https:// github.com/ tensorflow/models/issues/2153
Any ideas?
I had a similar issue. This issue is due to mismatch in Tensorflow versions used for training and inference. I solved the issue by using Tensorflow - 1.4 for both training and inference.
Please refer to this answer.