My code is the following:
import boto3
polly_client = boto3.Session().client('polly')
response = polly_client.synthesize_speech(
VoiceId='Joanna',
OutputFormat='mp3',
Text = sentence
)
audio = response['AudioStream']
I've tried using the following sentence:
sentence = '''<speak><s>Mary had a little lamb</s> <s>Whose fleece was white as snow</s>And everywhere that Mary went, the lamb was sure to go.</speak>'''
but the generated audio doesn't have the pause, it just reads out the text.
This generates an audio file for the phrase "Hello world" with a 2 second pause between the words:
See the aws documentation subsections under Using SSML for more details.