tf.data API 'flat_map' method for unpacking after window method with VariantDatasets in dictionary

28 views Asked by At

When attempting to use the flat_map method to unpack the sub VariantDatasets in my feature dictionary, I struggle with understanding the function to be parsed into the flat_map method to succesfully get a time series dataset for training.

I define the dataset and map the function to split into features and labels:

self.data_ds = tf.data.experimental.CsvDataset(
            ['TData/train.csv'], self.defaults, header=True)
        print(list(self.data_ds.take(1)))
        self.data_ds = self.data_ds.map(self._parse_csv_row).batch(4)
        self.window_data(self.data_ds)

The self._parse_csv_row is defined as:

    def _parse_csv_row(self, *vals):
        feat_vals = vals[:16]
        features = dict(zip(self.column_names[:-1], feat_vals))
        # features = feat_vals
        class_label = vals[16]
        # class_label = {self.column_names[-1]: class_label}

        return features, class_label

The self.window_data is defined below, and the problem seem to lie with the function in flat_map method. I am struggling with understanding how to unpack this correctly essentially.

    def window_data(self, data_ds):
        data_ds = data_ds.window(self.window_size, shift=self.shift, drop_remainder=True)
        for sub_ds in data_ds.take(1):
            print(sub_ds)
        data_ds = data_ds.flat_map(lambda xs, y: {key: window.batch(self.window_size) for key, window in xs})

        return data_ds

The print(sub_ds) gives this output:

({'close': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'volume': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'pricechange': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'sma': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'macd': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'macdsignal': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'macdhist': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'upperband': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'middleband': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'lowerband': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'rsi': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'slowk': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'slowd': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'cci': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'adx': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'atr': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>}, <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>)

And everything looks to be alright up until this point.

I've more or less been wildly guessing to what I need to input as the function. The function requires two variables, xs and y. I found the function currently in the flat_map method here: https://github.com/tensorflow/tensorflow/issues/39414. But I had to add y as the lambda expected 2 variables.

I am essentially trying to fit this data to the model below. The final dataset should be windowed with a size of 300, and there are 16 features. The small batch size I will increase after it gets working, but it makes the debugging easier.

numeric_features = [tf.feature_column.numeric_column(feat) for feat in data.column_names[:-1]]
feature_layer = tf.keras.layers.DenseFeatures(numeric_features)
# output_bias = tf.keras.initializers.Constant(init_bias)
model1 = Sequential([
    feature_layer,
    LSTM(units=64, stateful=True),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dropout(0.3),
    Dense(units=8, activation='tanh', kernel_regularizer=tf.keras.regularizers.L2(0.16)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dropout(0.3),
    Dense(units=1, activation='sigmoid') # , bias_initializer=output_bias)
])
0

There are 0 answers