openNMT translate commands yields garbage results

273 views Asked by At

I am running the following command

onmt_translate  -model demo-model_step_100000.pt -src data/src-test.txt -output pred.txt -replace_unk -verbose

The results in the file 'pred.txt' is something completely different than the source sentences given for translation

The corpus size was 3000 parallel sentences. The preprocess command was -

onmt_preprocess -train_src EMMT/01engParallel_onmt.txt -train_tgt EMMT/01maiParallel_onmt.txt -valid_src EMMT/01engValidation_onmt.txt -valid_tgt EMMT/01maiValidation_onmt.txt -save_data EMMT/demo

training was on the demo model

onmt_train -data EMMT/demo -save_model demo-model
1

There are 1 answers

9
Wiktor Stribiżew On

You cannot get decent translations even on "seen" data because:

  • Your model got trained on too few sentence pairs (3000 is really too, too few to train a good model). You can only get some more or less meanignful translations with corpora of 4M+ (and the more the better).
  • onmt_train -data EMMT/demo -save_model demo-model trains a small (2 layers x 500 neurons) unidirectional RNN model (see documentation). The transformer model type is recommended to obtain state-of-the-art results.

The FAQ says this about how to run a transformer model training:

The transformer model is very sensitive to hyperparameters. To run it effectively you need to set a bunch of different options that mimic the Google setup. We have confirmed the following command can replicate their WMT results.

python  train.py -data /tmp/de2/data -save_model /tmp/extra \
        -layers 6 -rnn_size 512 -word_vec_size 512 -transformer_ff 2048 -heads 8  \
        -encoder_type transformer -decoder_type transformer -position_encoding \
        -train_steps 200000  -max_generator_batches 2 -dropout 0.1 \
        -batch_size 4096 -batch_type tokens -normalization tokens  -accum_count 2 \
        -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 \
        -max_grad_norm 0 -param_init 0  -param_init_glorot \
        -label_smoothing 0.1 -valid_steps 10000 -save_checkpoint_steps 10000 \
        -world_size 4 -gpu_ranks 0 1 2 3

Here are what each of the parameters <mean:

param_init_glorot -param_init 0: correct initialization of parameters

position_encoding: add sinusoidal position encoding to each embedding

optim adam, decay_method noam, warmup_steps 8000: use special learning rate.

batch_type tokens, normalization tokens, accum_count 4: batch and normalize based on number of tokens and not sentences. Compute gradients based on four batches.

label_smoothing 0.1: use label smoothing loss.