I'm having Twilio call into a Zoom call and then interact with the other callers. It works fine, but the quality of the speech is fairly low. I tried using the Neural Polly and Google WaveNet, but it sounds much worse than when I try listening to those voices on the AWS and Google websites. My understanding is that Twilio downgrades the quality for phone calls to 8 kHz only. Is there a way to increase the quality of the speech?
I have tried different voices.
Twilio's developer advocate here.
Usually the audio stream are of calls from the public switch telephone network using PCM (8 kHz).
At present, below are the formats that Twilio supports:
Twilio converts everything to 8-bit with the sample rate to match the model (usually 8 kHz or 16 kHz). However, the standard bandwidth is limited to the 300 Hz - 8 kHz audio range and is designed for voice and provides acceptable voice-quality results.
So the payload must be encoded audio/x-mulaw with a sample rate of 8000 and Base64 encoded.
So, with regards to Twilio Programmable Voice and the recording function, I'm afraid we only have the 8 kHz sample rate that is, at this time, the only sample rate.
For more details, please see the article: Best Practices for Audio Recordings
I know this is not the answer you're looking for, but I wanted to make sure I had a proper explanation to share with you for this condition.