Using the synthesizeToFile method of Android TextToSpeech, how are we to know what file format (WAV, MP3, OGG), and/or attributes (sample rate, bit depth, etc.) the resulting file will be?
I can't find an explicit standard in the documentation... it doesn't even promise any particular file format such as WAV.
Is this simply up to the speech engine to implement however they choose?
What if we want to do something with the result, like calculate the duration of the file? We would have to know the details about the file format in advance. This is made even more unpredictable by the fact that there's no way to know what engine is installed/running on the end user's device.
Is there really no standard for this?
In android document about synthesizeToFile. You can see a suggestion format in
filename
parameter is.wav
And attributes of audio depend on your input source or you can setup it using Voice. You can get information of audio file after you save it successfully. Example: You can use
MediaPlayer
to get format, duration, bitrate...You also can use AudioTrack to play raw data by reading audio buffer. AudioTrack is standard to play raw audio bytes