Azure - Speech To Text - detect speaker channel

Question

Azure - Speech To Text - detect speaker channel

308 views Asked by Jakub Holovsky At 27 November 2023 at 13:30

I am using Azure Speech To Text - continuous recognition to transcribe an audio file. I have my speakers split in stereo wav file into left and right channel. However when I am running the transcription I am not able the get channel correctly. I tried to receive it from the PropertyId.SpeechServiceResponse_JsonResult but that always returns 0. My expectation is 0 for left channel and 1 for right channel.

var speechConfig = SpeechConfig.FromSubscription(/*api key*/, /*region*/);
var audioConfig = AudioConfig.FromWavFileInput(filePath);
var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

Is there some hidden property or missing configuration to achieve this?

My try to find the channel from the JsonResult property:

var speechServiceResponseJsonResultJson = eventArgs.Result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult);

var channel = 0;
if (speechServiceResponseJsonResultJson != null)
{
    var speechServiceResponseJsonResult =
        JsonConvert.DeserializeObject<JObject>(
            eventArgs.Result.Properties.GetProperty(PropertyId
                .SpeechServiceResponse_JsonResult));

    if (speechServiceResponseJsonResult.TryGetValue("Channel", StringComparison.InvariantCultureIgnoreCase, out var channelValue))
    {
        channel = channelValue.ToObject<int>();
    }
}

Original Q&A

There are 1 answers

**Rishabh Meshram** · Accepted Answer · 2023-11-30T10:30:59+00:00

It appears that the SpeechServiceResponse_JsonResult property does not provide the speaker channel information. The Azure Speech to Text service does not directly provide a way to differentiate between left and right channels in a stereo audio file. The documentation does not mention any property or configuration that would allow you to achieve this directly.

A possible workaround for transcribing a stereo audio file could be to split the stereo audio file into two separate mono audio files, transcribe each mono audio file separately using Azure Speech To Text, and then combine the transcriptions while keeping track of which channel the transcription came from.

This approach will allow you to know which channel the transcription is coming from, as you will be processing each channel separately.

Also, as you mentioned you want to identify the speakers IDs with transcript, you can use the conversation transcription with diarization that can help in distinguish between speakers and provide output with Speaker ID.

With this sample code, I was able to get transcribed text with speaker ID. Output: enter image description here

TechQA.

Azure - Speech To Text - detect speaker channel

There are 1 answers

Related Questions in C#

Related Questions in AZURE

Related Questions in AZURE-COGNITIVE-SERVICES

Related Questions in SPEECH-TO-TEXT

Related Questions in AZURE-SPEECH

Popular Questions

Popular Tags

Trending Questions