I am running the following command
onmt_translate -model demo-model_step_100000.pt -src data/src-test.txt -output pred.txt -replace_unk -verbose
The results in the file 'pred.txt' is something completely different than the source sentences given for translation
The corpus size was 3000 parallel sentences. The preprocess command was -
onmt_preprocess -train_src EMMT/01engParallel_onmt.txt -train_tgt EMMT/01maiParallel_onmt.txt -valid_src EMMT/01engValidation_onmt.txt -valid_tgt EMMT/01maiValidation_onmt.txt -save_data EMMT/demo
training was on the demo model
onmt_train -data EMMT/demo -save_model demo-model
You cannot get decent translations even on "seen" data because:
onmt_train -data EMMT/demo -save_model demo-model
trains a small (2 layers x 500 neurons) unidirectional RNN model (see documentation). Thetransformer
model type is recommended to obtain state-of-the-art results.The FAQ says this about how to run a transformer model training: