Here are the repeatable steps to produce my issue:
In Twilio Video Rooms, I start a call between two participants (a host and a guest). I mute the guest's audio using track.disable()
. I start recording the call using Twilio's recording API. After 40 seconds I unmute the guest's audio. After a further 20 seconds I stop recording.
Twilio generates two .mka audio files - one for each side of the call. Ideally I would like these to be the same length, with the guest recording starting with 40 seconds of silence. But the recordings are of different lengths - the host recording is about 40 seconds longer than the guest recording, which only seems to start when I enabled the guest's audio.
How do I find out the exact timestamp difference between the times that the two recordings start? Alternatively, can I get Twilio to start the audio recordings for the two participants simultaneously, even though one of them has its audio track disabled?
(Context: I want to do this because I want to use AWS Transcribe to generate transcripts for the two recordings, and then combine the two transcripts into a unified transcript. For the entries in the combined transcript to be in the correct order, I need to know the difference between the start times of the two recordings.)
According to https://www.twilio.com/docs/video/api/recordings-resource, a recording's offset
property is: "The time in milliseconds elapsed between an arbitrary point in time, common to all group rooms, and the moment when the source room of this track started. This information provides a synchronization mechanism for recordings belonging to the same room." However, the offset
property is the same for my two recordings - is this a Twilio bug?
There's also the date_created
property of Twilio recordings, but it's given to the nearest second rather than the nearest millisecond, and I don't know how precisely it reflects the actual start time of the recording.