I build a dataset for trainning 2 heads neural network, first with lstm and the second with simple perceptron.

My dataset process in 2 ways to get one version split into train and test set and the second version not split to perform train test and simulation of the complete data at the end.

here my code to do that:

# fonction to split innitial dataset into train and test dataset:
def is_test(x, _):
    return x % int(self.val_split * 100) == 0

def is_train(x, y):
    return not is_test(x, y)

recover = lambda x, y: y
full_dataset

# Split the dataset for training.
test_set = full_dataset.enumerate().filter(is_test).map(recover)

# Split the dataset for testing/validation.
trainning_set = full_dataset.enumerate().filter(is_train).map(recover)

test_set = test_set.batch(batch_size).cache().prefetch(2)
trainning_set = trainning_set.batch(batch_size).cache().prefetch(2)

full_dataset = full_dataset.batch(batch_size).cache().prefetch(2)

making a check on each dataset:

full_dataset:
<_PrefetchDataset element_spec=({'input1': TensorSpec(shape=(None, None, 3), dtype=tf.float32, name=None), 'input2': TensorSpec(shape=(None, 13), dtype=tf.float32, name=None)}, TensorSpec(shape=(None,), dtype=tf.float32, name=None))>

test_set: 
<_PrefetchDataset element_spec=({'input1': TensorSpec(shape=(None, None, 3), dtype=tf.float32, name=None), 'input2': TensorSpec(shape=(None, 13), dtype=tf.float32, name=None)}, TensorSpec(shape=(None,), dtype=tf.float32, name=None))>

trainning_set:
<_PrefetchDataset element_spec=({'input1': TensorSpec(shape=(None, None, 3), dtype=tf.float32, name=None), 'input2': TensorSpec(shape=(None, 13), dtype=tf.float32, name=None)}, TensorSpec(shape=(None,), dtype=tf.float32, name=None))>

Now why trainning my model with split set train fine

model.fit(trainning_set, validation_data=data.test_set)

but trainning my model with all the data doesn't work and produce nan??!!

model.fit(full_dataset)

Epoch 1/5
160/160 - 2s - loss: nan - nash_sutcliffe: nan - 2s/epoch - 12ms/step
Epoch 2/5
160/160 - 0s - loss: nan - nash_sutcliffe: nan - 319ms/epoch - 2ms/step
...

I did some search and test but can't found what is different btw these 2 versions of dataset and why one work an not the other one!?

here samples of my test_set and full_dataset before batching... as you can see it are the same except for test_set, the values of inputs1 are more rounded (?!) but still float32

for inputs, targets in test_set.take(1):
            print("Feature:", inputs)
            print("Label:", targets)

Feature: {'input1': <tf.Tensor: shape=(5, 3), dtype=float32, numpy=
array([[ 0.  , 16.12,  0.  ],
       [ 0.  , 17.42,  0.57],
       [ 0.  , 11.36, 13.97],
       [ 0.  , 10.55,  0.96],
       [ 0.  , 11.56,  0.24]], dtype=float32)>, 'input2': <tf.Tensor: shape=(13,), dtype=float32, numpy=
array([1.4391040e+02, 5.4850894e+03, 8.7901926e+00, 3.6657768e+01,
       5.4554661e+01, 9.5567673e+01, 2.0000000e+00, 5.8438915e+01,
       2.0383540e+03, 6.7381866e+01, 5.6437737e+01, 4.7759323e+00,
       0.0000000e+00], dtype=float32)>}
Label: tf.Tensor(0.91, shape=(), dtype=float32)

for inputs, targets in full_dataset.take(1):
            print("Feature:", inputs)
            print("Label:", targets)

Feature: {'input1': <tf.Tensor: shape=(5, 3), dtype=float32, numpy=
array([[0.000e+00, 9.860e+00, 0.000e+00],
       [0.000e+00, 1.308e+01, 0.000e+00],
       [0.000e+00, 1.433e+01, 1.000e-02],
       [0.000e+00, 1.630e+01, 0.000e+00],
       [0.000e+00, 1.644e+01, 0.000e+00]], dtype=float32)>, 'input2': <tf.Tensor: shape=(13,), dtype=float32, numpy=
array([1.4391040e+02, 5.4850894e+03, 8.7901926e+00, 3.6657768e+01,
       5.4554661e+01, 9.5567673e+01, 2.0000000e+00, 5.8438915e+01,
       2.0383540e+03, 6.7381866e+01, 5.6437737e+01, 4.7759323e+00,
       0.0000000e+00], dtype=float32)>}
Label: tf.Tensor(0.79, shape=(), dtype=float32)
1

There are 1 answers

0
mhenning On BEST ANSWER

(copied from comments)

Did you try with more epochs on the split set? To me, it looks like both should go towards nan values because you work with unscaled data, and I assume something like ReLU activation functions in the model. The full_dataset should arrive faster at nan's, because there is more data per epoch and therefore more gradient steps with the same batch size. More gradient updates per epoch lead to faster exploding weights in the network.

Solution: Use things like StandardScaler on your data (and don't forget to split the data first into train&test, and fit only on the train data.)