AVAssetWriterInput - Insufficient video frames for Captured Audio

627 views Asked by At

I've got a moderately complicated AVAssetWriterInput setup that I'm using to be able to flip the camera while I'm recording. Basically run two sessions, when the user taps to flip the camera I disconnect session 1 from the output and attach session 2.

This works really great. I can export the video and it plays just fine.

Now that I'm trying to do more advanced stuff with the resulting video some problems are popping up, specifically the AVAssetTracks on the inside of the exported AVAsset are slightly mismatched (always by less than 1 frame). Specifically I'm trying to do this: https://www.raywenderlich.com/6236502-avfoundation-tutorial-adding-overlays-and-animations-to-videos but a significant amount of the time there ends up being an all black frame, sometimes at the head of the video, sometimes at the tail of the video, that appears for a split second. The time varies, but it's always less than a frame (see logs below, 1/30 or 0.033333333s)

I did a bit of back-and-forth debugging and I managed to record a video using my recorder that consistently produced a trailing black frame, BUT using the tutorial code I have not been able to create a video that produces a trailing black frame. I added some similar logging (to what's pasted below) to the tutorial code and I'm seeing deltas of no greater than 2/100ths of a second. So around 1/10th of 1 frame at most. It's even a perfect 0 on one occasion.

So my sense right now is that what's happening is I record my video, both assetInputs start to gobble data, and then when I say "stop" they stop. The video input stops with the last complete frame, and the audio input does similarly. But since the audio input is sampling at a much higher rate than the video they're not synced up perfectly and I end up with more audio than video. This isn't a problem until I compose an asset with the two tracks and then the composition engine thinks I mean "yes, actually use 100% of all the time for all tracks even if there is a mismatch" which results in the black screen.

(Edit: This is basically what's happening - https://blender.stackexchange.com/questions/6268/audio-track-and-video-track-are-not-the-same-length)

I think the correct solution is, instead of worrying about the composition construction and timing and making sure it's all right, just make the captured audio and video match up as nicely as possible. Ideally 0, but I'd be fine with anything around 1/10th of a frame.

So my question is: How do I make two AVAssetWriterInputs, one audio and one video, attached to a AVAssetWriter line up better? Is there a setting somewhere? Do I mess with the framerates? Should I just trim the exported asset to the length of the video track? Can I duplicate the last captured frame when I stop recording? Can I have it so that the inputs stop at different times - basically have the audio stop first and then wait for the video to 'catch up' and then stop the video? Something else? I'm at a loss for ideas here :|

MY LOGGING

BUFFER | VIdeo SETTINGS: Optional(["AVVideoCompressionPropertiesKey": {
    AllowFrameReordering = 1;
    AllowOpenGOP = 1;
    AverageBitRate = 7651584;
    **ExpectedFrameRate = 30;**
    MaxKeyFrameIntervalDuration = 1;
    MaxQuantizationParameter = 41;
    MinimizeMemoryUsage = 1;
    Priority = 80;
    ProfileLevel = "HEVC_Main_AutoLevel";
    RealTime = 1;
    RelaxAverageBitRateTarget = 1;
    SoftMinQuantizationParameter = 18;
}, "AVVideoCodecKey": hvc1, "AVVideoWidthKey": 1080, "AVVideoHeightKey": 1920])

BUFFER | AUDIO SETTINGS Optional(["AVNumberOfChannelsKey": 1, "AVFormatIDKey": 1633772320, **"AVSampleRateKey": 48000**])


BUFFER | asset duration: 0.5333333333333333
BUFFER | video track duration: 0.5066666666666667
BUFFER | Audio track duration: 0.5333333333333333
**BUFFER | Asset Delta: -0.026666666666666616**

BUFFER | asset duration: 0.384
BUFFER | video track duration: 0.37333333333333335
BUFFER | Audio track duration: 0.384
**BUFFER | Asset Delta: -0.010666666666666658**

BUFFER | asset duration: 0.9405416666666667
BUFFER | video track duration: 0.935
BUFFER | Audio track duration: 0.9405416666666667
**BUFFER | Asset Delta: -0.005541666666666667**

TUTORIAL LOGGING

COMPOSE | asset duration: 0.7333333333333333
COMPOSE | video track duration: 0.7333333333333333
COMPOSE | audio track duration: 0.7316666666666667
**Delta: ~0.01667**

COMPOSE | asset duration: 1.3333333333333333
COMPOSE | video track duration: 1.3333333333333333
COMPOSE | audio track duration: 1.3316666666666668
**Delta: ~0.01667**

COMPOSE | asset duration: 1.0316666666666667
COMPOSE | video track duration: 1.0316666666666667
COMPOSE | audio track duration: 1.0316666666666667
**Delta: 0 (wow)**
2

There are 2 answers

4
nickneedsaname On BEST ANSWER

TL;DR - don't just AVAssetWriter.finishWriting {} because then the last written frame is T_End. Instead, use AVAssetWriter.endSession(atSourceTime:) to set T_End to be the time of the last written video frame.

AVCaptureVideoDataOutputSampleBufferDelegate TO THE RESCUE!!

Use AVCapture(Video|Audio)DataOutputSampleBufferDelegate to write buffers to the AVAssetWriter (attach delegates to AVCaptureVideoDataOutput and AVCaptureAudioDataOutput)

Once the session is started and your outputs are going they're going to constantly be spitting out data onto this delegate

  1. canWrite is a flag that tracks whether you should be recording (writing sampleBuffers to the AVAssetWriter) or not
  2. In order to prevent leading black frames we need to make sure the first frame is a video frame. Until we get a video frame even if we're recording we ignore the frames. startSession(atSourceTime:) sets T0 for the asset, which we're setting to be the time of the first video frame
  3. Every time a video frame is written, record that time on a separate queue. This frees up the delegateQueue to only do frame processing//writing as well as guaranteeing that stopping recording (which will be triggered from the main queue) will not have collisions or memory issues when reading the lastVideoFrameWrite

Now for the fun part!

  1. In order to prevent trailing black frames, we have the AVAssetWriter end its session at T_lastVideoFrameTime. This discards all frames (audio and video) that were written after T_lastVideoFrameTime ensuring that both assetTracks inside the AVAssetWriter are as synced up as possible.

RESULTS

BUFFER | asset duration: 1.8683333333333334
BUFFER | video track duration: 1.8683333333333334
BUFFER | Audio track duration: 1.868
BUFFER | Asset Delta: 0.0003333333333332966

BUFFER | asset duration: 1.435
BUFFER | video track duration: 1.435
BUFFER | Audio track duration: 1.4343333333333332
BUFFER | Asset Delta: 0.0006666666666668153

BUFFER | asset duration: 1.8683333333333334
BUFFER | video track duration: 1.8683333333333334
BUFFER | Audio track duration: 1.8682291666666666
BUFFER | Asset Delta: 0.00010416666666679397

BUFFER | asset duration: 1.435
BUFFER | video track duration: 1.435
BUFFER | Audio track duration: 1.4343541666666666
BUFFER | Asset Delta: 0.0006458333333334565

LOOK AT THOSE DELTAS!!!!! all sub millisecond. Very nice.

CODE

To Record

    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
    guard CMSampleBufferDataIsReady(sampleBuffer) else {
        return
    }
    if output == audioDataOutput {
        // PROCESS AUDIO BUFFER
    }
    if output == videoDataOutput {
        // PROCESS VIDEO BUFFER
    }
    
    // 1
    let writable = canWrite
    let time = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
    if writable && sessionAtSourceTime == nil {
        // 2
        if output == videoDataOutput {
            sessionAtSourceTime = time
            videoWriter.startSession(atSourceTime: sessionAtSourceTime!)
        } else {
            return
        }
    }
    
    if output == videoDataOutput && writable {
        if videoWriterInput != nil {
            if videoWriterInput.isReadyForMoreMediaData {
                //Write video buffer
                videoWriterInput.append(sampleBuffer)
                // 3
                WBufferCameraSessionController.finishRecordQueue.async {
                    self.lastVideoFrameWrite = time
                }
            }
        }
    } else if writable,
              output == audioDataOutput,
              audioWriterInput != nil,
              audioWriterInput.isReadyForMoreMediaData {
        //Write audio buffer
        audioWriterInput.append(sampleBuffer)
    }
    if output == videoDataOutput {
        bufferDelegate?.didOuputVideoBuffer(buffer: sampleBuffer)
    }
}

Stop Recording

func stopRecording() {
    guard isRecording else {
        return
    }
    guard isStoppingRecording == false else {
        return
    }
    isStoppingRecording = true
    WBufferCameraSessionController.finishRecordQueue.async {
        // 4
        if self.lastVideoFrameWrite != nil {
            self.videoWriter.endSession(atSourceTime: self.lastVideoFrameWrite)
        }
        self.videoWriter.finishWriting { 
             // cleanup, do stuff with finished file if writing was successful
             ...
        }
    ...
    }
}
0
Binh Ho On

To remove last frame, see nickneedsaname's answer

To remove first frame. We need to start session after got first image frame.

// if sessionAtSourceTime == nil {
//     sessionAtSourceTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
//     videoWriter.startSession(atSourceTime: sessionAtSourceTime!)
// }

if output == videoDataOutput, videoWriterInput.isReadyForMoreMediaData {
    // ---> MOVE TO HERE
    if sessionAtSourceTime == nil {
        sessionAtSourceTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
        videoWriter.startSession(atSourceTime: sessionAtSourceTime!)
    }
    
    // Write video buffer
    videoWriterInput.append(sampleBuffer)
    self.lastVideoFrameWriteTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
} else if sessionAtSourceTime != nil && output == audioDataOutput, audioWriterInput.isReadyForMoreMediaData {
    // Write audio buffer
    self.audioWriterInput.append(sampleBuffer)
}