I'm trying to remove the automatic breaks added by the synthesis processor, to create speech files without any "linguistic pauses".
I'm using Microsoft's speech synthesis engine with the SpeechSynthesizer
class in C#.
This is the output I get with "This is an example why do automatic breaks occur?" wrapped in <speak>
tags with SpeechSynthesizer
:
This is the output I want (achieved by using Oddcast's TTS Demo):
I've read through w3.org's SSML documentation several times which in point 3.2.3 - break element, note the following:
If the element is not present between tokens, the synthesis processor is expected to automatically determine a break based on the linguistic context. In practice, the break element is most often used to override the typical automatic behavior of a synthesis processor.
This is how my voice is currently behaving. I want to somehow override/turn off this functionality, and have the speech be completely uninterrupted. I have tried putting a <break>
element with attributes strength="none"
and time="0ms"
between the words where this automatic break occurs like they write above to override it, and all kinds of different things such as wrapping the whole text string in <s>
tags etc, to no avail.
I also can't just remove the breaks in post processing, since the voice has a different tone on the words spoken, when the automatic breaks are added.
I have read through several different SSML documentations which, while often worded a bit differently compared to the w3 docs, don't explain how to concretely override the automatic breaks, which is my issue.
In my experimenting with SpeechSynthesizer if you put a break of 50ms at the end then it will respect it - if it's less then it'll be ignored. However, it will always treat
<speak>
wrapped content as its own clause, so will speak it as if it's a sentence/clause, rather than carrying the prosody like the 2nd example. You need to send all your text in a single<speak>
element (and voice) to have it treated as a single linguistic utterance.