I am using the Azure SpeechSynthesizer libraries in python. I have written the code that will translate some text into speech. I am finding that you need to make a get() call on the result to actually have it do any speech synthesis. But this get() call is essentially blocking.
pull_stream = speechsdk.audio.PullAudioOutputStream()
stream_config = speechsdk.audio.AudioOutputConfig(stream=pull_stream)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=stream_config)
result = speech_synthesizer.speak_text_async(text)
result.get()
del speech_synthesizer
Suppose I don't call the result.get(), I am unable to pull any data from the stream. But when I call the result.get(), it blocks for several seconds while translating the text to speech. I have run this with an AudioOutputConfig of filename to have it just save to a wave file, and the timing is about the same. So I know it is doing the same work regardless of whether I get the output as a stream or a file.
Are there any pointers on how to get this to work asynchronously so I can pull from the stream as it is translating, and not have to wait until it completes?
I tried the following code to convert text to speech using result = speech_synthesizer.speak_text_async(text).get() with a .wav file and successfully converted the text to speech.
Code :
Output :
The code below successfully converted the text to speech output as follows.