Using AVSampleBufferAudioRenderer to play packets of streamed PCM audio (decoded from Opus)

220 views Asked by At

Edit: Updated code based on suggestions, fixing the ASBD and making another attempt at getting PTS right. It still doesn't play any audio, but there are no errors anymore at least.


I'm working on an iOS project where I'm receiving packets of Opus audio data and attempting to play them using AVSampleBufferAudioRenderer. Right now I'm using Opus's own decoder, so ultimately I just need to get the decoded PCM packets to play. The whole process from top to bottom isn't suuuper well documented, but I think I'm getting close. Here's the code I'm working with so far (edited down, and with some hardcoded values for simplicity).

static AVSampleBufferAudioRenderer* audioRenderer;
static AVSampleBufferRenderSynchronizer* renderSynchronizer;

int samplesPerFrame = 240;
int channelCount    = 2;
int sampleRate      = 48000;
int streams         = 1;
int coupledStreams  = 1;
char mapping[8] = ['\0','\x01','\0','\0','\0','\0','\0','\0'];

CMTime startPTS;

// called when the stream is about to start
void AudioInit()
{
    renderSynchronizer = [[AVSampleBufferRenderSynchronizer alloc] init];
    audioRenderer = [[AVSampleBufferAudioRenderer alloc] init];
    [renderSynchronizer addRenderer:audioRenderer];
    
    int decodedPacketSize = samplesPerFrame * sizeof(short) * channelCount; // 240 samples per frame * 2 channels
    decodedPacketBuffer = SDL_malloc(decodedPacketSize);
    
    int err;
    opusDecoder = opus_multistream_decoder_create(sampleRate,       // 48000
                                                  channelCount,     // 2
                                                  streams,          // 1
                                                  coupledStreams,   // 1
                                                  mapping,
                                                  &err);

    [renderSynchronizer setRate:1.0 time:kCMTimeZero atHostTime:CMClockGetTime(CMClockGetHostTimeClock())];
    startPTS = CMClockGetTime(CMClockGetHostTimeClock());
}

// called every X milliseconds with a new packet of audio data to play, IF there's audio. (while testing, X = 5)
void AudioDecodeAndPlaySample(char* sampleData, int sampleLength)
{
    // decode the packet from Opus to (I think??) Linear PCM
    int numSamples;
    numSamples = opus_multistream_decode(opusDecoder,
                                         (unsigned char *)sampleData,
                                         sampleLength,
                                         (short*)decodedPacketBuffer,
                                         samplesPerFrame, // 240
                                         0);

    int bufferSize = sizeof(short) * numSamples * channelCount; // 240 samples * 2 channels

    CMTime currentPTS = CMTimeSubtract(CMClockGetTime(CMClockGetHostTimeClock()), startPTS);

    // LPCM stream description
    AudioStreamBasicDescription asbd = {
        .mFormatID          = kAudioFormatLinearPCM,
        .mFormatFlags       = kLinearPCMFormatFlagIsSignedInteger,
        .mBytesPerPacket    = sizeof(short) * channelCount,
        .mFramesPerPacket   = 1,
        .mBytesPerFrame     = sizeof(short) * channelCount,
        .mChannelsPerFrame  = channelCount, // 2
        .mBitsPerChannel    = 16,
        .mSampleRate        = sampleRate // 48000,
        .mReserved          = 0
    };
    
    // audio format description wrapper around asbd
    CMAudioFormatDescriptionRef audioFormatDesc;
    OSStatus status = CMAudioFormatDescriptionCreate(kCFAllocatorDefault,
                                                     &asbd,
                                                     0,
                                                     NULL,
                                                     0,
                                                     NULL,
                                                     NULL,
                                                     &audioFormatDesc);
    
    // data block to store decoded packet into
    CMBlockBufferRef blockBuffer;
    status = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault,
                                                decodedPacketBuffer,
                                                bufferSize,
                                                kCFAllocatorNull,
                                                NULL,
                                                0,
                                                bufferSize,
                                                0,
                                                &blockBuffer);
    
    // data block converted into a sample buffer
    CMSampleBufferRef sampleBuffer;
    status = CMAudioSampleBufferCreateReadyWithPacketDescriptions(kCFAllocatorDefault,
                                                                  blockBuffer,
                                                                  audioFormatDesc,
                                                                  numSamples,
                                                                  currentPTS,
                                                                  NULL,
                                                                  &sampleBuffer);
    
    
    // queueing sample buffer onto audio renderer
    [audioRenderer enqueueSampleBuffer:sampleBuffer];
}

The AudioDecodeAndPlaySample function comes from the library I'm working with, and as the comment says, is called with a packet of about 5 ms worth of samples at a time (and, important to note, does not get called if there's silence).

There are plenty of places here I could be wrong - I think I'm correct that the opus decoder (docs here) decodes into Linear PCM (interleaved), and I hope I'm building the AudioStreamBasicDescription correctly. I'm definitely not sure what to do with the PTS (presentation timestamp) in CMAudioSampleBufferCreateReadyWithPacketDescriptions - I'm trying to come up with a time based on current host time - init host time, but I have no idea if that works or not.

Most code examples I've seen of enqueueSampleBuffer have it wrapped in requestMediaDataWhenReady with a dispatch queue, which I have also tried to no avail. (I suspect it's more good practice than essential to functioning, so I'm just trying to get the simplest case working first; but if it is essential I can drop it back in.)

Feel free to respond using Swift if you're more comfortable with it, I can work with either. (I'm stuck with Objective-C here, like it or not. )

2

There are 2 answers

1
MeLean On

It seems like you're on the right track with your iOS audio project. Your approach to decoding Opus audio data and attempting to play it using AVSampleBufferAudioRenderer is fundamentally sound, but there are a few potential issues and improvements to consider in your code.

// Global variable to keep track of the current PTS
CMTime currentPTS = kCMTimeZero;

void AudioDecodeAndPlaySample(char* sampleData, int sampleLength)
{
    // [Existing decoding logic]

    // Update PTS
    CMTime frameDuration = CMTimeMake(numSamples, sampleRate);
    currentPTS = CMTimeAdd(currentPTS, frameDuration);

    // [Existing LPCM stream description logic]

    // [Existing sample buffer creation logic]

    // Queueing sample buffer onto audio renderer within requestMediaDataWhenReady block
    [audioRenderer requestMediaDataWhenReadyOnQueue:dispatch_get_main_queue() usingBlock:^{
        if ([audioRenderer isReadyForMoreMediaData]) {
            CMSampleBufferSetOutputPresentationTimeStamp(sampleBuffer, currentPTS);
            [audioRenderer enqueueSampleBuffer:sampleBuffer];
            CFRelease(sampleBuffer); // Don't forget to release the sample buffer
        }
    }];
}
6
Gordon Childs On

Congratulations on finding what I consider to be one of the more obscure Apple audio playback APIs!

As MeLean correctly pointed out, your sample timestamps were not progressing (you do need them).
In addition to that the AudioStreamBasicDescription was wrong and you didn't provide a mapping to the synchronizer between your timestamps' timeline and the hosttime timeline.

The fixed ASBD:

// In uncompressed audio, a Packet is one frame, (mFramesPerPacket == 1).

// LPCM stream description
AudioStreamBasicDescription asbd = {
    .mFormatID          = kAudioFormatLinearPCM,
    .mFormatFlags       = kLinearPCMFormatFlagIsSignedInteger,
    .mBytesPerPacket    = sizeof(short) * channelCount,
    .mFramesPerPacket   = 1,
    .mBytesPerFrame     = sizeof(short) * channelCount,
    .mChannelsPerFrame  = channelCount, // 2
    .mBitsPerChannel    = 16,
    .mSampleRate        = sampleRate, // 48000
    .mReserved          = 0
};

One possible timeline mapping (a.k.a play it ASAP, consequences be damned):

[renderSynchronizer setRate:1.0 time:kCMTimeZero atHostTime:CMClockGetTime(CMClockGetHostTimeClock())];

Timestamps that progress:

// with your other variables
uint64_t samplesEnqueued = 0;

// ...

// data block converted into a sample buffer
CMSampleBufferRef sampleBuffer;
status = CMAudioSampleBufferCreateReadyWithPacketDescriptions(kCFAllocatorDefault,
                                                              blockBuffer,
                                                              audioFormatDesc,
                                                              numSamples,
                                                              CMTimeMake(samplesEnqueued, sampleRate),
                                                              NULL,
                                                              &sampleBuffer);


samplesEnqueued += numSamples;

// queueing sample buffer onto audio renderer
[audioRenderer enqueueSampleBuffer:sampleBuffer];

// ...

You have your own requirements for when you provide data the the renderer, but the code snippet in the API header file instead calls you. You can probably ignore this:

[audioRenderer requestMediaDataWhenReadyOnQueue:dispatch_get_main_queue() usingBlock:^{
    AudioDecodeAndPlaySample(sampleData, sampleLength);
    // get more sampleData
}];

p.s. your use of SDL_malloc suggests you might be using this code in a game. Last time I used AVSampleBufferAudioRenderer IIRC its latency was unimpressive, but I may have been holding it wrong. If low latency is a requirement, you may need to rethink your design.

p.p.s silence => no callback means you'll have to adjust the timestamps to account for the missing silence frames