I have batches of 3D tf.data.Dataset for training and I need to split it to train_X and train_Y due to my main system requires that way. I used below method to split but get strange results. Can someone comment or help? I am not good at tensorflow.
datasetX = dataset1.map(lambda x,y : x)
datasetY = dataset1.map(lambda x,y : y)
This method works for simple tf.data.Dataset tuple but shows strange behavior on my dataset.
I created dummy dataset which has same format as mine as below.
import tensorflow as tf
import numpy as np
window_size = 4
batch_size = 5
shuffle_buffer_size = 1000
n_character=6
x_train_All=np.arange(0,window_size*batch_size*n_character)
x_train_All=np.reshape(x_train_All,(window_size*batch_size,n_character))
dataset = tf.data.Dataset.from_tensor_slices(x_train_All)
dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(window_size + 1))
dataset = dataset.map(lambda window: (window[:-1], window[1:]))
dataset1 = dataset.shuffle(shuffle_buffer_size)
datasetX = dataset1.map(lambda x,y : x)
datasetY = dataset1.map(lambda x,y : y)
dataset_Num_X=[]
dataset_Num_Y=[]
dataset_NumXAfterSplit=[]
dataset_NumYAfterSplit=[]
for element in dataset1.as_numpy_iterator():
e,f=element
dataset_Num_X.append(e)
dataset_Num_Y.append(f)
for window in datasetX.as_numpy_iterator():
g=window
dataset_NumXAfterSplit.append(g)
for window in datasetY.as_numpy_iterator():
g=window
dataset_NumYAfterSplit.append(g)
Based on the design, dataset_Num_X should be same as dataset_NumXAfterSplit and dataset_NumYAfterSplit should be same as dataset_Num_Y but they are not. Any help will be greatly appreciated.
best,