I am trying to transcribe Twilio voice call in real-time with WebSockets. Twilio has multiple examples for this. I am following this one: https://www.twilio.com/en-us/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text
It works as expected. Basically, you call your Twilio number and whatever you speak will get transcribed. Now I want to add a <Dial> flow to it so that when a customer calls, the call will be connected to an agent (Via <Dial>) and the whole conversation will be transcribed.
The problem here is that only the stream of the caller is getting transcribed. The stream of the dialed agent is not being transcribed. I searched and tried quite a few things, but I am not able to get access to the audio stream of the dialed call via WebSocket.
Does anyone know how to do this?
Twilio Support Engineer here. In order for your WebSocket server to recieve both the inbound audio track as well as the outbound audio track (the child call), you need to specify the track attribute of the noun. For example: