I am using 2.1 release of festival. I was able to install and use 172M voice with
(voice_cmu_us_slt_arctic_clunits)
The quality has been significantly improved but far from desired. I believe generation still uses a lot of defaults. Is it possible to tune this further (e.g. close to the quality of qwiki.com engine)? I understand that I need a proper combination of
- Synthesis method
- Intonation/duration settings
- Audio output parameters
- xx ?
but it is very difficult to find all the details (the progress is quite slow).
Any tips, links to tutorials/docs (old version but provides some theory overview) or scheme snippets are appreciated.
PS
Please note that so far I am not interested in the tuning of the algorithms themselves (e.g. training the voice model with sphinx).
To generate speech I use commands like
(SayText "This is a short introduction ...")
and
./text2wave -eval '(voice_cmu_us_slt_arctic_clunits)' TEXT > output.wav