I am running two tensorflow models using the tf.estimator.train_and_evaluate function one after the other.
# The first
train_spec = tf.estimator.TrainSpec(input_fn=my_input_fn("train"), max_steps=max_steps)
eval_spec = tf.estimator.EvalSpec(input_fn=my_input_fn("valid"), steps=None)
tf.estimator.train_and_evaluate(model1, train_spec, eval_spec)
# the second
hook = [tf.estimator.StopAtStepHook(num_steps=max_steps)]
train_spec = tf.estimator.TrainSpec(input_fn=my_input_fn("train"), max_steps=None, hooks=hook)
eval_spec = tf.estimator.EvalSpec(input_fn=my_input_fn("valid"), steps=None)
tf.estimator.train_and_evaluate(model2, train_spec, eval_spec)
The first trains OK but the second trains for only 1 step:
...
INFO:tensorflow:Saving dict for global step 1: LogLoss = 0.06514542, PR_AUC = 0.012231247, ROC_AUC = 0.52047175, global_step = 1, label/mean = 0.011529858, loss = 0.06514542, prediction/mean = 0.016156415
...
INFO:tensorflow:Loss for final step: 0.32117385.
I tried to run two models sequentially on the same dataset using tf.estimator.train_and_evaluate. I expect both of the trainings run similarly. However, the second training runs for only 1 step and finishes.
Solution: I was using tf.estimator.inputs.numpy_input_fn (https://docs.w3cub.com/tensorflow~1.15/estimator/inputs/numpy_input_fn) in the second train_spec input_fn and when num_epochs=1 or not used as an argument, it terminates early. I changed to num_epochs=None and the early termination issue is solved. For this solution, max_steps=None should be set in the second training and stopping hook should be used. Additionally, early stopping hook should not be used in the second training in my case. However, this forces second training to continue for the specified number of epochs.
Solution: I was using tf.estimator.inputs.numpy_input_fn (https://docs.w3cub.com/tensorflow~1.15/estimator/inputs/numpy_input_fn) in the second train_spec input_fn and when num_epochs=1 or not used as an argument, it terminates early. So, in the solution, three things should be done: