gcloud jobs submit prediction 'can't decode json' with --data-format=TF_RECORD

513 views Asked by At

I pushed up some test data to gcloud for prediction as a binary tfrecord-file. Running my script I got the error ('No JSON object could be decoded', 162). What do you think I am doing wrong?

To push a prediction job to gcloud, i use this script:

REGION=us-east1
MODEL_NAME=mymodel
VERSION=v_hopt_22
INPUT_PATH=gs://mydb/test-data.tfr
OUTPUT_PATH=gs://mydb/prediction.tfr
JOB_NAME=pred_${MODEL_NAME}_${VERSION}_b

args=" --model "$MODEL_NAME
args+=" --version "$VERSION

args+=" --data-format=TF_RECORD"
args+=" --input-paths "$INPUT_PATH
args+=" --output-path "$OUTPUT_PATH

args+=" --region "$REGION

gcloud ml-engine jobs submit prediction $JOB_NAME $args

test-data.tfr has been generated from a numpy array, as so:

import numpy as np

filename = './Datasets/test-data.npz'
data = np.load(filename)
features = data['X'] # features[channel, example, feature]
np_features = np.swapaxes(features, 0, 1) # features[example, channel, feature]

import tensorflow as tf
import nnscoring.data as D

def floats_feature(arr):
    return tf.train.Feature(float_list=tf.train.FloatList(value=arr.flatten().tolist()))

writer = tf.python_io.TFRecordWriter("./Datasets/test-data.tfr")

for i, np_example in enumerate(np_features):
    if i%1000==0: print(i)
    tf_feature = {  
        ch: floats_feature(x)
        for ch, x in zip(D.channels, np_example)
    }
    tf_features = tf.train.Features(feature=tf_feature)
    tf_example = tf.train.Example(features=tf_features)
    writer.write(tf_example.SerializeToString())

writer.close()

Update (following yxshi):

I define the following serving function

def tfrecord_serving_input_fn():
    import tensorflow as tf
    seq_length = int(dt*sr) 
    examples = tf.placeholder(tf.string, shape=())
    feat_map = {
        channel: tf.FixedLenSequenceFeature(shape=(seq_length,),
            dtype=tf.float32, allow_missing=True)
        for channel in channels
    }
    parsed = tf.parse_single_example(examples, features=feat_map)
    features = {
        channel: tf.expand_dims(tensor, -1)
        for channel, tensor in parsed.iteritems()
    }
    from collections import namedtuple
    InputFnOps = namedtuple("InputFnOps", "features labels receiver_tensors")
    tf.contrib.learn.utils.input_fn_utils.InputFnOps = InputFnOps
    return InputFnOps(features=features, labels=None, receiver_tensors=examples)
    # InputFnOps = tf.contrib.learn.utils.input_fn_utils.InputFnOps
    # return InputFnOps(features, None, parsed)
    # Error: InputFnOps has no attribute receiver_tensors

.., which I pass to generate_experiment_fn as so:

export_strategies = [
      saved_model_export_utils.make_export_strategy(
          tfrecord_serving_input_fn,
          exports_to_keep = 1,
          default_output_alternative_key = None,
  )]

  gen_exp_fn = generate_experiment_fn(
      train_steps_per_iteration = args.train_steps_per_iteration,
      train_steps        = args.train_steps,
      export_strategies  = export_strategies
  )

(aside: note the dirty patch of InputFnOps)

1

There are 1 answers

3
yxshi On

It looks like the input is not correctly specified in the inference graph. To use tf_record as input data format, your inference graph must accept strings as the input placeholder. In your case, you should have something like below in your inference code:

 examples = tf.placeholder(tf.string, name='input', shape=(None,))
 with tf.name_scope('inputs'):
   feature_map = {
     ch: floats_feature(x)
     for ch, x in zip(D.channels, np_example)
   }
   parsed = tf.parse_example(examples, features=feature_map)
   f1 = parsed['feature_name_1']
   f2 = parsed['feature_name_2']

 ...

A close example is here: https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/flowers/trainer/model.py#L253

Hope it helps.