I want to convert text to speech from a document where multiple languages are included. When I am trying to do the following code, I fetch problems to record each language clearly. How can I save such type mixer text-audio clearly?
from gtts import gTTS
mytext = 'Welcome to gtts! আজ একটি ভাল দিন। tumi kemon acho? ٱلْحَمْدُ لِلَّٰهِ'
language = 'ar' # arabic
myobj = gTTS(text=mytext, tld='co.in', lang=language, slow=False)
myobj.save("audio.mp3")
It's not enough to use just text to speech, since it can work with one language only.
To solve this problem we need to detect language for each part of the sentence.
Then run it through text to speech and append it to our final spoken sentence.
It would be ideal to use some neural network (there are plenty) to do this categorization for You.
Just for a sake of proof of concept I used
googletransto detect language for each part of the sentences andgttsto make a mp3 file from it.It's not bullet proof, especially with arabic text.
googletranssomehow detect different language code, which is not recognized bygtts. For that reason we have to use code_table to pick proper language code that works with gtts.Here is working example:
It's obviously not fast and won't fit for longer text.
Also it needs better tokenizer and proper error handling.
Again, it's just proof of concept.