Ducking background music when using Text-To-Speech postUtteranceDelay doesn't un-duck

2k views Asked by At

Problem:

When using Text-To-Speech, I want background audio to dim (or 'duck'), speak an utterance and then un-duck the background audio. It mostly works, however when trying to un-duck, it stays ducked without an error thrown in the deactivation.

Context & Code:

The method that speaks an utterance:

// Create speech utterance
AVSpeechUtterance *speechUtterance = [[AVSpeechUtterance alloc]initWithString:textToSpeak];
speechUtterance.rate = instance.speechRate;
speechUtterance.pitchMultiplier = instance.speechPitch;
speechUtterance.volume = instance.speechVolume;
speechUtterance.postUtteranceDelay = 0.005;

AVSpeechSynthesisVoice *voice = [AVSpeechSynthesisVoice voiceWithLanguage:instance.voiceLanguageCode];
speechUtterance.voice = voice;

if (instance.speechSynthesizer.isSpeaking) {
    [instance.speechSynthesizer stopSpeakingAtBoundary:AVSpeechBoundaryImmediate];
}

AVAudioSession *audioSession = [AVAudioSession sharedInstance];
NSError *activationError = nil;
[audioSession setActive:YES error:&activationError];
if (activationError) {
    NSLog(@"Error activating: %@", activationError);
}

[instance.speechSynthesizer speakUtterance:speechUtterance]; 

Then deactivating it when speechUtterance is finished speaking:

- (void)speechSynthesizer:(AVSpeechSynthesizer *)synthesizer didFinishSpeechUtterance:(AVSpeechUtterance *)utterance
{
    dispatch_queue_t myQueue = dispatch_queue_create("com.company.appname", nil);
dispatch_async(myQueue, ^{
        NSError *error = nil;

        if (![[AVAudioSession sharedInstance] setActive:NO withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&error]) {
            NSLog(@"Error deactivating: %@", error);
        }
    });
}

Setting the app's audio category in the App Delegate:

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{    
    AVAudioSession *audioSession = [AVAudioSession sharedInstance];
    NSError *setCategoryError = nil;
    [audioSession setCategory:AVAudioSessionCategoryPlayback
                                 withOptions:AVAudioSessionCategoryOptionDuckOthers error:&setCategoryError];
}

What I have tried:

The ducking/unducking works when I deactivate the AVAudioSession after a delay:

dispatch_time_t popTime = dispatch_time(DISPATCH_TIME_NOW, 0.2 * NSEC_PER_SEC);
dispatch_after(popTime, dispatch_queue_create("com.company.appname", nil), ^(void){
    NSError *error = nil;

    if (![[AVAudioSession sharedInstance] setActive:NO withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&error]) {
        NSLog(@"Error deactivating: %@", error);
    }
});

However, the delay is noticeable and I get an error in the console:

[avas] AVAudioSession.mm:1074:-[AVAudioSession setActive:withOptions:error:]: Deactivating an audio session that has running I/O. All I/O should be stopped or paused prior to deactivating the audio session.

Question:

How can I combine AVSpeechSynthesizer with ducking of background audio properly?

EDIT: Apparently the issue stems from using postUtteranceDelay on AVSpeechUtterance, that causes the music to keep being dimmed. Removing that property fixes the issue. However, I need postUtteranceDelay for some of my utterances, so I have updated the title.

2

There are 2 answers

2
Casey On BEST ANSWER

the ducking worked (started and stopped) without any issue/error using your code while listening to Spotify. i used a iPhone 6S on iOS 9.1 so it is possible that this is an iOS 10 issue.

i would recommend removing the dispatch wrap entirely as it shouldn't be necessary. this may resolve the issue for you.

working code sample is below, all i did was create a new project ("Single View Application") and changed my AppDelegate.m to look like this:

#import "AppDelegate.h"
@import AVFoundation;

@interface AppDelegate () <AVSpeechSynthesizerDelegate>
@property (nonatomic, strong) AVSpeechSynthesizer *speechSynthesizer;
@end

@implementation AppDelegate


- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
    AVAudioSession *audioSession = [AVAudioSession sharedInstance];

    NSError *setCategoryError = nil;
    [audioSession setCategory:AVAudioSessionCategoryPlayback withOptions:AVAudioSessionCategoryOptionDuckOthers error:&setCategoryError];
    if (setCategoryError) {
        NSLog(@"error setting up: %@", setCategoryError);
    }

    self.speechSynthesizer = [[AVSpeechSynthesizer alloc] init];
    self.speechSynthesizer.delegate = self;

    AVSpeechUtterance *speechUtterance = [[AVSpeechUtterance alloc] initWithString:@"Hi there, how are you doing today?"];

    AVSpeechSynthesisVoice *voice = [AVSpeechSynthesisVoice voiceWithLanguage:@"en-US"];
    speechUtterance.voice = voice;

    NSError *activationError = nil;
    [audioSession setActive:YES error:&activationError];
    if (activationError) {
        NSLog(@"Error activating: %@", activationError);
    }

    [self.speechSynthesizer speakUtterance:speechUtterance];

    return YES;
}

- (void)speechSynthesizer:(AVSpeechSynthesizer *)synthesizer didFinishSpeechUtterance:(AVSpeechUtterance *)utterance {
    NSError *error = nil;
    if (![[AVAudioSession sharedInstance] setActive:NO withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&error]) {
        NSLog(@"Error deactivating: %@", error);
    }
}

@end

the only output from the console when running on a physical device is:

2016-12-21 09:42:08.484 DimOtherAudio[19017:3751445] Building MacinTalk voice for asset: (null)

UPDATE

setting the postUtteranceDelay property created the same issue for me.

the documentation for postUtteranceDelay states this:

The amount of time a speech synthesizer will wait after the utterance is spoken before handling the next queued utterance.

When two or more utterances are spoken by an instance of AVSpeechSynthesizer, the time between periods when either is audible will be at least the sum of the first utterance’s postUtteranceDelay and the second utterance’s preUtteranceDelay.

it is pretty clear from the documentation that this value is only designed to be used when another utterance will be added. i confirmed that adding a second utterance which hasn't set postUtteranceDelay unducks the audio.

AVAudioSession *audioSession = [AVAudioSession sharedInstance];

NSError *setCategoryError = nil;
[audioSession setCategory:AVAudioSessionCategoryPlayback withOptions:AVAudioSessionCategoryOptionDuckOthers error:&setCategoryError];
if (setCategoryError) {
    NSLog(@"error setting up: %@", setCategoryError);
}

self.speechSynthesizer = [[AVSpeechSynthesizer alloc] init];
self.speechSynthesizer.delegate = self;

AVSpeechUtterance *speechUtterance = [[AVSpeechUtterance alloc] initWithString:@"Hi there, how are you doing today?"];
speechUtterance.postUtteranceDelay = 0.005;

AVSpeechSynthesisVoice *voice = [AVSpeechSynthesisVoice voiceWithLanguage:@"en-US"];
speechUtterance.voice = voice;

NSError *activationError = nil;
[audioSession setActive:YES error:&activationError];
if (activationError) {
    NSLog(@"Error activating: %@", activationError);
}

[self.speechSynthesizer speakUtterance:speechUtterance];

// second utterance without postUtteranceDelay
AVSpeechUtterance *speechUtterance2 = [[AVSpeechUtterance alloc] initWithString:@"Duck. Duck. Goose."];
[self.speechSynthesizer speakUtterance:speechUtterance2];
0
Jeff Scaturro On

Here's my Swift 3 version, taken from Casey's answer above:

import Foundation
import AVFoundation

class Utils: NSObject {
    static let shared = Utils()

    let synth = AVSpeechSynthesizer()
    let audioSession = AVAudioSession.sharedInstance()

    override init() {
        super.init()

        synth.delegate = self
    }

    func say(sentence: String) {
        do {
            try audioSession.setCategory(AVAudioSessionCategoryPlayback, with: AVAudioSessionCategoryOptions.duckOthers)

            let utterance = AVSpeechUtterance(string: sentence)

            try audioSession.setActive(true)

            synth.speak(utterance)
        } catch {
            print("Uh oh!")
        }
    }
}

extension Utils: AVSpeechSynthesizerDelegate {
    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
        do {
            try audioSession.setActive(false)
        } catch {
            print("Uh oh!")
        }
    }
}

I then call this anywhere in my app like: Utils.shared.say(sentence: "Thanks Casey!")