I am using the moses toolkit for my translation system. I am using Assamese and English parallel corpus and trained them. But some proper nouns are not translated. This is because I have a very small corpus (parallel data set). So I want to use the transliteration process in my translation system.
I am using this command for my translation: echo 'কানাদা এখন বিশাল দেশ ।'| ~/mymoses/bin/moses -f ~/work/mert-work/moses.ini
This gave me the output "কানাদা is a vast country".
This is because the word "কানাদা" is not in my parallel corpus.
So I took some parallel list of words in Assamese and English, and break each word character-wise. Thus, each line of the two files would have single words with a space between each character (or each syllable). i have used these 2 files to train the system as normal translation task
Then I used the following command echo 'কানাদা এখন বিশাল দেশ ।'| ~/mymoses/bin/moses -f ~/work/mert-work/moses.ini | ./space.pl
This gave me the output "ক া ন া দ া is a vast country"
I had to break the word because i have trained the system character-wise..
Then i used the transliteration system that i have trained using the command:
echo 'কানাদা এখন বিশাল দেশ ।'| ~/mymoses/bin/moses -f ~/work/mert-work/moses.ini | ./space.pl | ~/mymoses/bin/moses -f ~/work1/train/model/moses.ini
This gave me the output "c a n a d a is a vast country"
The characters are transliterated..but the only problem is the spaces between the word.So i want to use a perl file that will join the word. My final command will be
echo 'কানাদা এখন বিশাল দেশ ।'| ~/mymoses/bin/moses -f ~/work/mert-work/moses.ini | ./space.pl | ~/mymoses/bin/moses -f ~/work1/train/model/moses.ini | ./join.pl
Help me with this "join.pl" file.
 
                        
How about:
output:
You can use it in your program, just change the while loop to:
But I think you whish to do:
Output:
NB: It's up to you to build the true corresponding hash. I don't know anything about Assamese characters.