I have been trying to stream mulaw media stream back to Twilio. Requirement is payload must be encoded audio/x-mulaw with a sample rate of 8000 and base64 encoded
My input is from @google-cloud/text-to-speech in LINEAR16 Google Docs
I tried Wavefile
This is how I encoded the response from @google-cloud/text-to-speech
const wav = new wavefile.WaveFile(speechResponse.audioContent)
wav.toBitDepth('8')
wav.toSampleRate(8000)
wav.toMuLaw()
Then I send the result back to Twilio via WebSocket
twilioWebsocket.send(JSON.stringify({
event: 'media',
media: {
payload: wav.toBase64(),
},
streamSid: meta.streamSid,
}))
Problem is we only hear random noise on other ends of Twilio call, seems like encoding is not proper
Secondly I have checked the @google-cloud/text-to-speech output audio by saving it in a file and it was proper and clear
Can anyone please help me with the encoding
I just had the same Problem. The solution is, that you need to convert the LINEAR16 by hand to the corresponding MULAW Codec.
You can use the code from a music libary.
I created a function out of this to convert a linear16 byte array to mulaw:
Now you could use this on Raw PCM (Linear16). Now you just need to consider to strip the bytes at the beginning of the google stream since google adds a wav header. You can then encode the resulting base64 buffer and send this to twilio.