I want to train an autoencoder using keras where X_train is mxn matrix and y_train is also mxn matrix. for Examaple
X_train = np.array(([1, 2],
[3, 4]))
y_train = np.array(([5, 6],
[7, 8]))
I concatenate two matrix in train_set
and save into one file training.npy
train_set = np.concatenate([X_train, y_train], axis=1)
print(train_set)
array([[1, 2, 5, 6],
[3, 4, 7, 8]])
Later I save it to S3
training_path_input = sess.upload_data('/tmp/training.npy', key_prefix=prefix+'/training')
Now when I fit the model
model.fit({'train': training_path_input })
I wonder how estimator will find index for X_train
and y_train
since y_train
is not a vector unlike other cases. Is there any way to specify this in fit()
method.
Or is there any alternative way to do it?
The fit method does 2 things: (1) copy your data from
training_path_input
(on S3) to/opt/ml/input/data/<channel>
in the SageMaker training instance (/opt/ml/input/data/train
in your case) and (2) launching the code with any hyperparameter you specified. You need to make sure that your training code knows how to read the type of files you're copying to the machine. Your training code must include code that will read locally the copied files.