Azure Speech diarization failing to tag speakers properly until a long 7second statement is spoken

Question

Azure Speech diarization failing to tag speakers properly until a long 7second statement is spoken

76 views Asked by Goofy At 27 December 2023 at 07:27

Azure speech private preview for diarization was earlier setting “unknown” speaker tag until it recognise a long 7 seconds statement from a speaker, with the api in public preview it started tagging guest-n which brings accuracy concern, even if a guest-1 detected and received short sentences it is getting tagged guest-2 until guest-2 speaks a long sentence and likewise

Is there a solution to get the private preview behaviour back?

As per documentation, they still say it will mark shorter sentences as unknown

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-stt-diarization?tabs=windows&pivots=programming-language-csharp

Used sdk version implementation group: 'com.microsoft.cognitiveservices.speech', name: 'client-sdk', version: '1.34.0'

Original Q&A

There are 1 answers

**Naveen Sharma** · Answer 1 · 2024-01-03T08:11:28+00:00

Diarization is described as the process of segmenting audio containing multiple speakers into discrete speech segments based on the identity of the speaker during each segment.

It is crucial for understanding “who is speaking when” in a speech recognition pipeline.

Note: Real-time diarization is currently in public preview.

This emphasizes the significance of diarization in various scenarios, including podcast sessions, call center calls, doctor-patient interactions, and team meetings.
It states that diarization is essential for providing context to downstream NLP systems, as it enables the modeling of conversations.
The code is taken from Real-time diarization git.

    private static String speechKey = "SPEECH_KEY";
    private static String speechRegion = "SPEECH_REGION";

    public static void main(String[] args) throws InterruptedException, ExecutionException {
        
        SpeechConfig speechConfig = SpeechConfig.fromSubscription(speechKey, speechRegion);
        speechConfig.setSpeechRecognitionLanguage("en-US");
        AudioConfig audioInput = AudioConfig.fromWavFileInput("katiesteve.wav");
        
        Semaphore stopRecognitionSemaphore = new Semaphore(0);

        ConversationTranscriber conversationTranscriber = new ConversationTranscriber(speechConfig, audioInput);
        {
            // Subscribes to events.
            conversationTranscriber.transcribing.addEventListener((s, e) -> {
                System.out.println("TRANSCRIBING: Text=" + e.getResult().getText());
            });

            conversationTranscriber.transcribed.addEventListener((s, e) -> {
                if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
                    System.out.println("TRANSCRIBED: Text=" + e.getResult().getText() + " Speaker ID=" + e.getResult().getSpeakerId() );
                }
                else if (e.getResult().getReason() == ResultReason.NoMatch) {
                    System.out.println("NOMATCH: Speech could not be transcribed.");
                }
            });

            conversationTranscriber.canceled.addEventListener((s, e) -> {
                System.out.println("CANCELED: Reason=" + e.getReason());

                if (e.getReason() == CancellationReason.Error) {
                    System.out.println("CANCELED: ErrorCode=" + e.getErrorCode());
                    System.out.println("CANCELED: ErrorDetails=" + e.getErrorDetails());
                    System.out.println("CANCELED: Did you update the subscription info?");
                }

                stopRecognitionSemaphore.release();
            });

            conversationTranscriber.sessionStarted.addEventListener((s, e) -> {
                System.out.println("\n    Session started event.");
            });

            conversationTranscriber.sessionStopped.addEventListener((s, e) -> {
                System.out.println("\n    Session stopped event.");
            });

            conversationTranscriber.startTranscribingAsync().get();

            // Waits for completion.
            stopRecognitionSemaphore.acquire();

            conversationTranscriber.stopTranscribingAsync().get();
        }

        speechConfig.close();
        audioInput.close();
        conversationTranscriber.close();

        System.exit(0);
    }

Output:

enter image description here

TechQA.

Azure Speech diarization failing to tag speakers properly until a long 7second statement is spoken

There are 1 answers

Related Questions in SPEECH-TO-TEXT

Related Questions in AZURE-SPEECH

Related Questions in DIARIZATION

Related Questions in SPEAKER-DIARIZATION

Popular Questions

Popular Tags

Trending Questions