How do I align the start times of simultaneous Twilio video rooms audio recordings when the offset property isn't doing its job?

66 views Asked by At

Here are the repeatable steps to produce my issue:

In Twilio Video Rooms, I start a call between two participants (a host and a guest). I mute the guest's audio using track.disable(). I start recording the call using Twilio's recording API. After 40 seconds I unmute the guest's audio. After a further 20 seconds I stop recording.

Twilio generates two .mka audio files - one for each side of the call. Ideally I would like these to be the same length, with the guest recording starting with 40 seconds of silence. But the recordings are of different lengths - the host recording is about 40 seconds longer than the guest recording, which only seems to start when I enabled the guest's audio.

How do I find out the exact timestamp difference between the times that the two recordings start? Alternatively, can I get Twilio to start the audio recordings for the two participants simultaneously, even though one of them has its audio track disabled?

(Context: I want to do this because I want to use AWS Transcribe to generate transcripts for the two recordings, and then combine the two transcripts into a unified transcript. For the entries in the combined transcript to be in the correct order, I need to know the difference between the start times of the two recordings.)

According to https://www.twilio.com/docs/video/api/recordings-resource, a recording's offset property is: "The time in milliseconds elapsed between an arbitrary point in time, common to all group rooms, and the moment when the source room of this track started. This information provides a synchronization mechanism for recordings belonging to the same room." However, the offset property is the same for my two recordings - is this a Twilio bug?

There's also the date_created property of Twilio recordings, but it's given to the nearest second rather than the nearest millisecond, and I don't know how precisely it reflects the actual start time of the recording.

0

There are 0 answers