Getting error on ML-Engine predict but local predict works fine

1.3k views Asked by At

I have searched a lot here but unfortunately could not find an answer.

I am running TensorFlow 1.3 (installed via PiP on MacOS) on my local machine, and have created a model using the provided "ssd_mobilenet_v1_coco" checkpoints.

I managed to train locally and on the ML-Engine (Runtime 1.2), and successfully deployed my savedModel to the ML-Engine.

Local predictions (below code) work fine and I get the model results

gcloud ml-engine local predict --model-dir=... --json-instances=request.json

 FILE request.json: {"inputs": [[[242, 240, 239], [242, 240, 239], [242, 240, 239], [242, 240, 239], [242, 240, 23]]]}

However when deploying the model and trying to run on the ML-ENGINE for remote predictions with the code below:

gcloud ml-engine predict --model "testModel" --json-instances request.json(SAME JSON FILE AS BEFORE)

I get this error:

  "error": "Prediction failed: Exception during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details=\"NodeDef mentions attr 'data_format' not in Op<name=DepthwiseConv2dNative; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_FLOAT, DT_DOUBLE]; attr=strides:list(int); attr=padding:string,allowed=[\"SAME\", \"VALID\"]>; NodeDef: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/depthwise = DepthwiseConv2dNative[T=DT_FLOAT, _output_shapes=[[-1,150,150,32]], data_format=\"NHWC\", padding=\"SAME\", strides=[1, 1, 1, 1], _device=\"/job:localhost/replica:0/task:0/cpu:0\"](FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Relu6, FeatureExtractor/MobilenetV1/Conv2d_1_depthwise/depthwise_weights/read)\n\t [[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/depthwise = DepthwiseConv2dNative[T=DT_FLOAT, _output_shapes=[[-1,150,150,32]], data_format=\"NHWC\", padding=\"SAME\", strides=[1, 1, 1, 1], _device=\"/job:localhost/replica:0/task:0/cpu:0\"](FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Relu6, FeatureExtractor/MobilenetV1/Conv2d_1_depthwise/depthwise_weights/read)]]\")"

I saw something similar here:

About the problem being with the "data-format" parameter. But unfortunately I could not use that solution since I am already on TensorFlow 1.3.

It also seems that it might be a problem with MobilenetV1: https:// tensorflow/models/issues/2153

Any ideas?


There are 2 answers


I had a similar issue. This issue is due to mismatch in Tensorflow versions used for training and inference. I solved the issue by using Tensorflow - 1.4 for both training and inference.

Please refer to this answer.

wcyn On

If you're wondering how to ensure that your model version is running the correct tensorflow version that you need to run, first have a look at this model versions list page

You need to know which model version supports the Tensorflow version that you need. At the time of writing:

  • ML version 1.4 supports TensorFlow 1.4.0 and 1.4.1
  • ML version 1.2 supports TensorFlow 1.2.0 and
  • ML version 1.0 supports TensorFlow 1.0.1

Now that you know which model version you require, you need to create a new version from your model, like so:

gcloud ml-engine versions create <version name> \
--model=<Name of the model> \
--origin=<Model bucket link. It starts with gs://...> \

In my case, I needed to predict using Tensorflow 1.4.1, so I used the runtime version 1.4.

Refer to this official MNIST tutorial page, as well as this ML Versioning Page