When attempting to use the flat_map method to unpack the sub VariantDatasets in my feature dictionary, I struggle with understanding the function to be parsed into the flat_map method to succesfully get a time series dataset for training.
I define the dataset and map the function to split into features and labels:
self.data_ds = tf.data.experimental.CsvDataset(
['TData/train.csv'], self.defaults, header=True)
print(list(self.data_ds.take(1)))
self.data_ds = self.data_ds.map(self._parse_csv_row).batch(4)
self.window_data(self.data_ds)
The self._parse_csv_row
is defined as:
def _parse_csv_row(self, *vals):
feat_vals = vals[:16]
features = dict(zip(self.column_names[:-1], feat_vals))
# features = feat_vals
class_label = vals[16]
# class_label = {self.column_names[-1]: class_label}
return features, class_label
The self.window_data
is defined below, and the problem seem to lie with the function in flat_map method. I am struggling with understanding how to unpack this correctly essentially.
def window_data(self, data_ds):
data_ds = data_ds.window(self.window_size, shift=self.shift, drop_remainder=True)
for sub_ds in data_ds.take(1):
print(sub_ds)
data_ds = data_ds.flat_map(lambda xs, y: {key: window.batch(self.window_size) for key, window in xs})
return data_ds
The print(sub_ds)
gives this output:
({'close': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'volume': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'pricechange': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'sma': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'macd': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'macdsignal': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'macdhist': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'upperband': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'middleband': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'lowerband': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'rsi': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'slowk': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'slowd': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'cci': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'adx': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>, 'atr': <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>}, <_VariantDataset element_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None)>)
And everything looks to be alright up until this point.
I've more or less been wildly guessing to what I need to input as the function. The function requires two variables, xs and y. I found the function currently in the flat_map method here: https://github.com/tensorflow/tensorflow/issues/39414. But I had to add y as the lambda expected 2 variables.
I am essentially trying to fit this data to the model below. The final dataset should be windowed with a size of 300, and there are 16 features. The small batch size I will increase after it gets working, but it makes the debugging easier.
numeric_features = [tf.feature_column.numeric_column(feat) for feat in data.column_names[:-1]]
feature_layer = tf.keras.layers.DenseFeatures(numeric_features)
# output_bias = tf.keras.initializers.Constant(init_bias)
model1 = Sequential([
feature_layer,
LSTM(units=64, stateful=True),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.3),
Dense(units=8, activation='tanh', kernel_regularizer=tf.keras.regularizers.L2(0.16)),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.3),
Dense(units=1, activation='sigmoid') # , bias_initializer=output_bias)
])