tensorflow estimator passes train data through some weird normalization

104 views Asked by At

Problem Description

I'm using tensorflow Estimator API, and have encountered a weird phenomenon. I'm passing the exact same input_fn to both training and evaluation, and for some reason the images which are provided to the network are not identical. They seem similar, but after taking a closer look, it seems that evaluation images are ok, but train images are somewhat distorted.

After loading them both, I noticed that for some reason the training images go through some kind of ReLu. I affirmed it with this code, which operates on mat_eval and mat_train, which are tensors that input_fn provides in evaluation and train mode:

special_relu = lambda mat: ((mat - 0.5) / 0.5) * ((mat - 0.5) / 0.5 > 0)
np.allclose(mat_train, special_relu(mat_eval))
>>> True

What I thought and tried

My initial thought was that it is some form of BatchNormalization. But BatchNormalization is supposed to happen within the network, and not as some preprocess, shouldn't it? What I recorded (using tf.summary.image) was the features['image'] object, passed to my model_fn. And if I understand correctly, the features object is passed to model_fn by the input_fn called by the Estimator object.

Regardless, I tried to remove the parts in the code which are supposed to call the BatchNormalization. This had no effect. Of course, I might have not done that in the right way, but as I said it I don't really think it is BatchNormalization.

Code

from datetime import datetime
from pathlib import Path

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.python.platform import tf_logging as logging

from dcnn import modeling
from dcnn.dv_constants import BATCH_SIZE, BATCHES_PER_EPOCH
from dcnn.variant_io import get_input_fn, num_variants_in_ds

logging.set_verbosity(logging.INFO)
new_checkpoint_name = lambda: f'./train_dir/' \
                              f'{datetime.now().strftime("%d-%m %H:%M:%S")}'
if __name__ == '__main__':
    model_name = 'small_inception'
    start_from_checkpoint = ''
    # start_from_checkpoint = '/home/yonatan/Desktop/yonas_code/dcnn/train_dir' \
    #                        '/2111132905/model.ckpt-256'
    model_dir = str(Path(start_from_checkpoint).parent) if \
        start_from_checkpoint else new_checkpoint_name()
    test = False
    train = True
    predict = False
    epochs = 1

    train_dataset_name = 'same_example'
    val_dataset_name = 'same_example'
    test_dataset_name = 'same_example'
    predict_dataset_name = 'same_example'

    model = modeling.get_model(model_name=model_name)
    estimator = model.make_estimator( \
        batch_size=BATCH_SIZE,
        model_dir=model_dir,
        params=dict(batches_per_epoch=BATCHES_PER_EPOCH),
        use_tpu=False,
        master='',
        # The target of the TensorFlow standard server to use. Can be the empty string to run locally using an inprocess server.
        start_from_checkpoint=start_from_checkpoint)

    if train:
        train_input_fn = get_input_fn(train_dataset_name, repeat=True)
        val_input_fn = get_input_fn(val_dataset_name, repeat=False)
        steps = (epochs * num_variants_in_ds(train_dataset_name)) / \
                BATCH_SIZE
        train_spec = tf.estimator.TrainSpec(input_fn=val_input_fn,
                                            max_steps=steps)
        eval_spec = tf.estimator.EvalSpec(input_fn=val_input_fn,
                                          throttle_secs=1)
        metrics = tf.estimator.train_and_evaluate(estimator, train_spec,
                                                  eval_spec)
        print(metrics)

I have plenty of more code to share, but I tried to be concise. If anyone has any idea why this behavior happens, or needs more information, let me know.

0

There are 0 answers