Continuous speech recognition without restart after 1 minute

2.2k views Asked by At

I'm trying to create an app that records the user's voice and at the same time it transcribes it. I'm using AVFoundation and Speech framework to do this work. The problem is that Apple limits the transcription time to one minute. So, after this period, I should recall the Speech Recognition Request. The problem is that I want also record the voice at the same time.

Does anyone know how I can fix this issue?

This is the code that I'm using:

    private func startRecording() throws {

    // Cancel the previous task if it's running.
    if let recognitionTask = recognitionTask {
        recognitionTask.cancel()
        self.recognitionTask = nil
    }

    try audioSession.setCategory(AVAudioSessionCategoryPlayAndRecord, with: .allowBluetoothA2DP)
    try audioSession.setMode(AVAudioSessionModeMeasurement)
    try audioSession.setActive(true, with: .notifyOthersOnDeactivation)

    recognitionRequest = SFSpeechAudioBufferRecognitionRequest()

    guard let inputNode = audioEngine.inputNode else { fatalError("Audio engine has no input node") }
    guard let recognitionRequest = recognitionRequest else { fatalError("Unable to created a SFSpeechAudioBufferRecognitionRequest object") }

    // Configure request so that results are returned before audio recording is finished
    recognitionRequest.shouldReportPartialResults = true

    // A recognition task represents a speech recognition session.
    // We keep a reference to the task so that it can be cancelled.
    recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
        var isFinal = false

        if result != nil {

            if let result = result {
                self.textView.text = result.bestTranscription.formattedString
            }

            isFinal = (result?.isFinal)!
            if isFinal == true{
                self.textView.text.append((result?.bestTranscription.formattedString)!)
            }
        }

        if error != nil || isFinal {

            print("Error: \(error)")
            print("ifFinal: \(isFinal)")
            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)

            self.recognitionRequest = nil
            self.recognitionTask = nil

            try! self.startRecording()
            self.recordButton.isEnabled = true
            self.recordButton.setTitle("Start Recording", for: [])
        }
    }


    let recordingFormat = inputNode.outputFormat(forBus: 0)

    inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in

        DispatchQueue.main.async {
        self.recognitionRequest?.append(buffer)
        self.writeBuffer(buffer)
        }
    }

    if !audioEngine.isRunning {
        audioEngine.prepare()
        try audioEngine.start()
    }

}

As you can see from the code, I make the request and I write the audio file in the installTap method. So every time that I have to restart the transcription I have also to remove the tap on the bus. In this way, I can't continue to record the audio file.

Is there something that I could do? Any solutions? Alternatives?

1

There are 1 answers

0
Tom Durrant On

You can install a tap on the audioEngine's mainMixerNode to do the recording. This should enable you to remove the tap on the inputNode without interrupting the recording.

Alternatively just change self.recognitionRequest without removing the tap. The original tap should automatically append the buffers to the new request.

When I tried to do the same thing I was able to start a new recognition request without interrupting the recording. However, I was not able to prevent gaps in the transcription. It seems like the first recognition request has to finish before a second can be started, and some buffers are lost in the middle. It may be possible to keep these buffers in memory until the second one starts...