I'm working on a project that involves grabbing H.264 encoded frames from VTCompressionSession in iOS8, muxing them with live AAC or PCM audio from the microphone into a playable MPEG2-TS and streaming that over a socket in real time with minimum delay (i.e: (almost) no buffering).

After watching the presentation for the new VideoToolbox in iOS8 and doing some research I guess it's safe to assume that:

  • The encoded frames you get from VTCompressionSession are not in Annex B format, so I need to convert them somehow (All of the explanations I've seen so far are too vague, so I'm not really sure on how you do this (i.e: Replace the "3 or 4 byte header with a length header")).

  • The encoded frames you get from VTCompressionSession are actually an Elementary Stream. So first I would need to turn them into a Packetized Elementary Stream before it can be muxed.

  • I would also need an AAC or PCM elementary stream from the microphone data (I presume PCM would be easier since no encoding is involved). Which I don't know how to do either.

  • In order to mux the Packetized Elementary Streams I would also need some library like libmpegts. Or perhaps ffmpeg (by using libavcodec and libavformat libraries).

I'm pretty new to this. Can I get some advice on what would be the right approach to achieve this?.

Is there an easier way to implement this using Apple APIs (like AVFoundation)?

Is there any similar project I can take as a reference?

Thanks in advance!

1

There are 1 answers

3
nevyn On BEST ANSWER

In order to mux the Packetized Elementary Streams I would also need some library like libmpegts. Or perhaps ffmpeg (by using libavcodec and libavformat libraries).

From what I can gather, there is no way to mux TS with AVFoundation or related frameworks. While it seems like something one can do manually, I'm trying to use the Bento4 library to accomplish the same task as you. I'm guessing libmpegts, ffmpeg, GPAC, libav, or any other library like that would work too, but I didn't like their APIs.

Basically, I'm following Mp42Ts.cpp, ignoring the Mp4 parts and just looking at the Ts writing parts.

This StackOverflow question has all the outline of how to feed it video, and implementation of how to feed it audio. If you have any questions, ping me with a more specific question.

I hope this provides a good starting point for you, though.

I would also need an AAC or PCM elementary stream from the microphone data (I presume PCM would be easier since no encoding is involved). Which I don't know how to do either.

Getting the microphone data as AAC is very straightforward. Something like this:

AVCaptureDevice *microphone = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio];
_audioInput = [AVCaptureDeviceInput deviceInputWithDevice:microphone error:&error];

if (_audioInput == nil) {
    NSLog(@"Couldn't open microphone %@: %@", microphone, error);
    return NO;
}

_audioProcessingQueue = dispatch_queue_create("audio processing queue", DISPATCH_QUEUE_SERIAL);

_audioOutput = [[AVCaptureAudioDataOutput alloc] init];
[_audioOutput setSampleBufferDelegate:self queue:_audioProcessingQueue];


NSDictionary *audioOutputSettings = @{
    AVFormatIDKey: @(kAudioFormatMPEG4AAC),
    AVNumberOfChannelsKey: @(1),
    AVSampleRateKey: @(44100.),
    AVEncoderBitRateKey: @(64000),
};

_audioWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:audioOutputSettings];
_audioWriterInput.expectsMediaDataInRealTime = YES;
if(![_writer canAddInput:_audioWriterInput]) {
    NSLog(@"Couldn't add audio input to writer");
    return NO;
}
[_writer addInput:_audioWriterInput];

[_captureSession addInput:_audioInput];
[_captureSession addOutput:_audioOutput];

- (void)audioCapture:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
    /// sampleBuffer contains encoded aac samples.
}

I'm guessing you're using an AVCaptureSession for your camera already; you can use the same capture session for the microphone.