Google Translate API Text to speech - setting encoding for non-roman characters

1k views Asked by At

I'm using Google Translate's unofficial Text-to-speech API (I've posted more info on it here).

The API endpoint looks like: https://translate.google.com/translate_tts?ie=utf-8&tl=en&q=Hello%20World

Making traditional API requests for words, I get No-access-control-origin and 404 blocks. To get around this, I've followed the php script in this blog which strips out the referrer before making the request (more info on my attempts here).

I'm able to get English to work, but I need this to work for Chinese. Unfortunately, when I pass in something like 你好, the voice seems to narrate gibberish. However, if you add this directly to your browser, it narrates perfectly.

https://translate.google.com/translate_tts?ie=utf-8&tl=zh-CN&q=你好

HTML:

<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta http-equiv="content-type" content="text/html; charset=utf-8" />

<audio controls="controls" autoplay="autoplay" style="display:none;">
    <source src="testPHP.php?translate_tts?ie=utf-8&tl=zh-CN&q=你好" type="audio/mpeg" />
</audio>

testPHP.php:

<?php
//https://translate.google.com/translate_tts?ie=UTF-8&q=' + text + '&tl=en
header('Content-type: text/plain; charset=utf-8');
$params = http_build_query(array("ie" => $_GET['ie'],"tl" => $_GET["tl"], "q" => $_GET["q"]));
$ctx = stream_context_create(array("http"=>array("method"=>"GET","header"=>"Referer: \r\n"))); //create and return stream context
$soundfile = file_get_contents("https://translate.google.com/translate_tts?".$params, false, $ctx); //reads file into string (string with params[utf-8, tl, q], use include path bool, custom context resource headers)

header("Content-type: audio/mpeg");
header("Content-Transfer-Encoding: binary");
header('Pragma: no-cache');
header('Expires: 0');

echo($soundfile);

tail -f apache access_logs shows:

GET /testPHP.php?translate_tts?ie=utf-8&tl=zh-CN&q=%E4%BD%A0%E5%A5%BD HTTP/1.1" 200 13536

This seems okay. As you can see, the q query param value, 你好, has been converted. This is fine because it still works if you put it in the browser:

https://translate.google.com/translate_tts?ie=utf-8&tl=zh-CN&q=%E4%BD%A0%E5%A5%BD

tail -f apache error_logs shows:

PHP Notice: Undefined index: ie in /Users/danturcotte/Sites/personal_practice/melonJS-dev/testPHP.php on line 4, referer: http://melon.localhost/

I'm not sure how this is happening, or if it's contributing to screwing up the pronunciation. I'm thinking that the words may be reading off parts of the ie index?

The query params from the browser side seem to be registering,

enter image description here

And as you can see from the apache access_logs, ie=utf-8 param is being set fine.

So my questions are:

  • I've added header('Content-type: text/plain; charset=utf-8'); to my testPHP.php file to ensure that the encoding is going through fine. Could this be contributing to the problem?

  • I'm building the URI query string as such: $params = http_build_query(array("ie" => $_GET['ie'],"tl" => $_GET["tl"], "q" => $_GET["q"]));, so how can there be an undefined index ie?

1

There are 1 answers

0
Mike On BEST ANSWER

The problem is in your URL:

GET /testPHP.php?translate_tts?ie=utf-8&tl=zh-CN&q=%E4%BD%A0%E5%A5%BD

You have two question marks, which means that PHP will get:

Array 
( 
[translate_tts?ie] => utf-8 
[tl] => zh-CN 
[q] => 你好 
)

Instead you need to do something like:

GET /testPHP.php?translate_tts=value&ie=utf-8&tl=zh-CN&q=%E4%BD%A0%E5%A5%BD