This might be more of a math problem, but I couldn't find any relevant document elsewhere.
I just want to figure out which equation is used to calculate alignment score in GIZA++.
Might anyone have an idea?
Thank you for your help in advance.
This might be more of a math problem, but I couldn't find any relevant document elsewhere.
I just want to figure out which equation is used to calculate alignment score in GIZA++.
Might anyone have an idea?
Thank you for your help in advance.
In short, word alignments and translation probabilities are learned in multiple iterations of Expectation Maximum algorithm.
The "Statistical Machine Translation" of Philip Koehn has a chapter for word alignments. Check statmt.org for more information.
If it helps, I found this document, which includes the following description:
Following up that reference leads to a paper entitled "The Mathematics of Statistical Machine Translation: Parameter Estimation", which you can find in PDF format here.
The paper gives details of the math underlying the 5 alignment models and is too verbose to paste here. Perhaps you can see if this is sufficiently detailed in its description of Model 4, which is what I assume is used by GIZA++.
There is also this PDF, which summarises the models and training process.