Adding directline voice to waterfall dialog bot using Microsoft bot framework

455 views Asked by At

I am trying to add the ability for using the direct line voice channel in my dialog bot. I was reading the tutorial from Microsoft on how to do this, but they just use the echo bot. I want to be able to use the dialog bot and return the voice. I have already created a speech resource in azure, as well as implemented the direct line speech channel in the bot resource on azure. Has any one been successful in adding voice to a dialog bot? I was reading that there would be prompt options for speech, but I can not find that property in my PromptOptions object.

1

There are 1 answers

0
Steven Kanberg On

How speech is configured depends on the type you mean to use, which also means possibly updating your bot as well as the client you are using.

A quick note about clients (i.e. channels) - the channel is what determines whether speech is supported or not. For instance:

  • Outside of designing a calling bot, STT (speech-to-text) is not supported in Teams.
  • Cortana is no longer a public offering and is now only accessible via an Enterprise account.
  • Web Chat, which is built on top of Direct Line and has DL Speech integrated into it, supports both CS and DL Speech
    • It's important to note that DL Speech is its own channel separate from Direct Line, therefore it requires additional code in your bot. The samples located at BotBuilder-Samples include this code, by default.
    • Custom-built clients using either DL or DL Speech would require you to build in functionality that would allow Speech to work.
  • Other channels, such as Telegram, Slack, etc., are channel specific and, of course, don't rely on CS or DL Speech for enabling. A custom-built third-party channel supporting speech would require you to consult that channel's docs for implementation.

Regarding DL Speech, you need to add / update your bot's index.js code to include the following:

[...]

// Catch-all for errors.
const onTurnErrorHandler = async (context, error) => {
    // This check writes out errors to console log .vs. app insights.
    // NOTE: In production environment, you should consider logging this to Azure
    //       application insights. See https://aka.ms/bottelemetry for telemetry 
    //       configuration instructions.
    console.error(`\n [onTurnError] unhandled error: ${ error }`);

    // Send a trace activity, which will be displayed in Bot Framework Emulator
    await context.sendTraceActivity(
        'OnTurnError Trace',
        `${ error }`,
        'https://www.botframework.com/schemas/error',
        'TurnError'
    );

    // Send a message to the user
    await context.sendActivity('The bot encountered an error or bug.');
    await context.sendActivity('To continue to run this bot, please fix the bot source code.');
};

// Set the onTurnError for the singleton BotFrameworkAdapter.
adapter.onTurnError = onTurnErrorHandler;

[...]

// Listen for Upgrade requests for Streaming.
server.on('upgrade', (req, socket, head) => {
    // Create an adapter scoped to this WebSocket connection to allow storing session data.
    const streamingAdapter = new BotFrameworkAdapter({
        appId: process.env.MicrosoftAppId,
        appPassword: process.env.MicrosoftAppPassword
    });
    // Set onTurnError for the BotFrameworkAdapter created for each connection.
    streamingAdapter.onTurnError = onTurnErrorHandler;

    streamingAdapter.useWebSocket(req, socket, head, async (context) => {
        // After connecting via WebSocket, run this logic for every request sent over
        // the WebSocket connection.
        await myBot.run(context);
    });
});

And, then in Web Chat, you would pass in the following. (You can reference the below code in this DL Speech sample. Also, note that you will want to update the "fetch" address to an API of your own for generating a token.):

[...]

const fetchCredentials = async () => {
    const res = await fetch('https://webchat-mockbot-streaming.azurewebsites.net/speechservices/token', {
      method: 'POST'
});

if (!res.ok) {
    throw new Error('Failed to fetch authorization token and region.');
}

const { region, token: authorizationToken } = await res.json();
    return { authorizationToken, region };
};

// Create a set of adapters for Web Chat to use with Direct Line Speech channel.
const adapters = await window.WebChat.createDirectLineSpeechAdapters({
    fetchCredentials
});

// Pass the set of adapters to Web Chat.
window.WebChat.renderWebChat(
    {
        ...adapters
    },
    document.getElementById('webchat')
);

[...]

Here are some addition resources to help you better understand DL Speech:


Regarding CS Speech, you need to have an active Cognitive Services subscription. Once you have your speech service setup in Azure, you use the subscription key to generate the token used for enabling CS Speech (you can also reference this Web Chat sample. Not changes in your bot are necessary for enabling. (Again, you will want to set up an API for generating a token as best practice is to NOT include any keys in the HTML. This is what I do in this example for getting a DL token):

let authorizationToken;
let region = '<<SPEECH SERVICES REGION>>';

const response = await fetch( `https://${ region }.api.cognitive.microsoft.com/sts/v1.0/issueToken`, {
  method: 'POST',
  headers: {
    'Ocp-Apim-Subscription-Key': '<<SUBSCRIPTION KEY>>'
  }
} );
if ( response.status === 200 ) {
  authorizationToken = await response.text(),
    region
} else {
  console.log( 'error' )
}

const webSpeechPonyfillFactory = await window.WebChat.createCognitiveServicesSpeechServicesPonyfillFactory( {
  authorizationToken,
  region
} );

const res = await fetch( 'http://localhost:3500/directline/token', { method: 'POST' } );
const { token } = await res.json();

window.WebChat.renderWebChat(
  {
    directLine: window.WebChat.createDirectLine( {
      token: token
    } ),
    webSpeechPonyfillFactory: webSpeechPonyfillFactory,
  },
  document.getElementById( 'webchat' )
);

Additional resources:


Hope of help!