Train autoencoder in script mode on AWS sagemaker

165 views Asked by At

I want to train an autoencoder using keras where X_train is mxn matrix and y_train is also mxn matrix. for Examaple

X_train = np.array(([1, 2], 
               [3, 4]))
y_train = np.array(([5, 6], 
               [7, 8]))

I concatenate two matrix in train_set and save into one file training.npy

train_set = np.concatenate([X_train, y_train], axis=1)
print(train_set)
array([[1, 2, 5, 6],
       [3, 4, 7, 8]])

Later I save it to S3

training_path_input = sess.upload_data('/tmp/training.npy', key_prefix=prefix+'/training')

Now when I fit the model

model.fit({'train': training_path_input })

I wonder how estimator will find index for X_train and y_train since y_train is not a vector unlike other cases. Is there any way to specify this in fit() method.

Or is there any alternative way to do it?

1

There are 1 answers

0
Olivier Cruchant On

The fit method does 2 things: (1) copy your data from training_path_input (on S3) to /opt/ml/input/data/<channel> in the SageMaker training instance (/opt/ml/input/data/train in your case) and (2) launching the code with any hyperparameter you specified. You need to make sure that your training code knows how to read the type of files you're copying to the machine. Your training code must include code that will read locally the copied files.