Tensorflow Transform debug and iterative development best practices?

193 views Asked by At

I've been using TFX for several projects now and have found that debuging tf.transform to be extremely challenging. My first few pipelines were composed in Airflow, which required me to commit code, push to Airflow, run Dag to get feedback on code. This was pretty terrible as it put at least a 2 minute delay in the iterative development loop. Using tfx within the notebooks provided in the example helps significantly, as it cuts that down to about a minute.

If code works, you can use code like this to dig into the resulting protobuf to get output

%%skip_for_export
import tensorflow as tf

train_transform_uri = os.path.join(transform.outputs['transformed_examples'].get()[0].uri, 'train')

# Get the list of files in this directory (all compressed TFRecord files)
tfrecord_filenames = [os.path.join(train_transform_uri, name)
                      for name in os.listdir(train_transform_uri)]

# Create a `TFRecordDataset` to read these files
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")

# Iterate over the first few tfrecords and decode them.
for tfrecord in dataset.take(5):
    serialized_example = tfrecord.numpy()
    example = tf.train.Example()
    example.ParseFromString(serialized_example)
    pprint.pprint(example)

Are there any other methods that folks are using to deal with this pain? Am i going about things wrong? I definitely see the benefit of using Transform to avoid training/serving skew pain, but i feel like the pain of Transform may actually be worse...

1

There are 1 answers

1
Brzoskwinia On

I came across this question while looking for solution to exactly the same problem. It's been two weeks since I started my research and I still haven't found anything better than printing decoded examples. So the answer to "Are there any other methods" question is: most likely not.