Encode LINEAR16 audio to Twilio media audio/x-mulaw | NodeJS

3.2k views Asked by At

I have been trying to stream mulaw media stream back to Twilio. Requirement is payload must be encoded audio/x-mulaw with a sample rate of 8000 and base64 encoded

My input is from @google-cloud/text-to-speech in LINEAR16 Google Docs

I tried Wavefile

This is how I encoded the response from @google-cloud/text-to-speech

 const wav = new wavefile.WaveFile(speechResponse.audioContent)
    wav.toBitDepth('8')
    wav.toSampleRate(8000)
    wav.toMuLaw()

Then I send the result back to Twilio via WebSocket

twilioWebsocket.send(JSON.stringify({
      event: 'media',
      media: {
        payload: wav.toBase64(),
      },
      streamSid: meta.streamSid,
}))

Problem is we only hear random noise on other ends of Twilio call, seems like encoding is not proper

Secondly I have checked the @google-cloud/text-to-speech output audio by saving it in a file and it was proper and clear

Can anyone please help me with the encoding

2

There are 2 answers

0
TheDome On

I just had the same Problem. The solution is, that you need to convert the LINEAR16 by hand to the corresponding MULAW Codec.

You can use the code from a music libary.

I created a function out of this to convert a linear16 byte array to mulaw:

short2ulaw(b: Buffer): Buffer {
    // Linear16 to linear8 -> buffer is half the size
    // As of LINEAR16 nature, the length should ALWAYS be even
    const returnbuffer = Buffer.alloc(b.length / 2)

    for (let i = 0; i < b.length / 2; i++) {
      // The nature of javascript forbids us to use 16-bit types. Every number is
      // A double precision 64 Bit number.
      let short = b.readInt16LE(i * 2)

      let sign = 0

      // Determine the sign of the 16-Bit byte
      if (short < 0) {
        sign = 0x80
        short = short & 0xef
      }

      short = short > 32635 ? 32635 : short

      const sample = short + 0x84
      const exponent = this.exp_lut[sample >> 8] & 0x7f
      const mantissa = (sample >> (exponent + 3)) & 0x0f
      let ulawbyte = ~(sign | (exponent << 4) | mantissa) & 0x7f

      ulawbyte = ulawbyte == 0 ? 0x02 : ulawbyte

      returnbuffer.writeUInt8(ulawbyte, i)
    }

    return returnbuffer
  }

Now you could use this on Raw PCM (Linear16). Now you just need to consider to strip the bytes at the beginning of the google stream since google adds a wav header. You can then encode the resulting base64 buffer and send this to twilio.

0
lm_eldg On

I also had this same problem. The error is in wav.toBase64(), as this includes the wav header. Twilio media streams expects raw audio data, which you can get with wav.data.samples, so your code would be:

 const wav = new wavefile.WaveFile(speechResponse.audioContent)
    wav.toBitDepth('8')
    wav.toSampleRate(8000)
    wav.toMuLaw()

 const payload = Buffer.from(wav.data.samples).toString('base64');