Split tf.data.Dataset tuple into two dataset

28 views Asked by At

I have batches of 3D tf.data.Dataset for training and I need to split it to train_X and train_Y due to my main system requires that way. I used below method to split but get strange results. Can someone comment or help? I am not good at tensorflow.

datasetX = dataset1.map(lambda x,y : x)
datasetY = dataset1.map(lambda x,y : y)

This method works for simple tf.data.Dataset tuple but shows strange behavior on my dataset.

I created dummy dataset which has same format as mine as below.

import tensorflow as tf
import numpy as np

window_size = 4
batch_size = 5
shuffle_buffer_size = 1000
n_character=6
x_train_All=np.arange(0,window_size*batch_size*n_character)
x_train_All=np.reshape(x_train_All,(window_size*batch_size,n_character))


dataset = tf.data.Dataset.from_tensor_slices(x_train_All)
dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(window_size + 1))
dataset = dataset.map(lambda window: (window[:-1], window[1:]))
dataset1 = dataset.shuffle(shuffle_buffer_size)
datasetX = dataset1.map(lambda x,y : x)
datasetY = dataset1.map(lambda x,y : y)

dataset_Num_X=[]
dataset_Num_Y=[]
dataset_NumXAfterSplit=[]
dataset_NumYAfterSplit=[]

for element in dataset1.as_numpy_iterator():
    e,f=element
    dataset_Num_X.append(e)
    dataset_Num_Y.append(f)

for window in datasetX.as_numpy_iterator():
    g=window
    dataset_NumXAfterSplit.append(g)

for window in datasetY.as_numpy_iterator():
    g=window
    dataset_NumYAfterSplit.append(g)

Based on the design, dataset_Num_X should be same as dataset_NumXAfterSplit and dataset_NumYAfterSplit should be same as dataset_Num_Y but they are not. Any help will be greatly appreciated.

best,

0

There are 0 answers