Google Cloud Text to speech in php

5.1k views Asked by At

I am trying to use google's text to speech in my php website to be hosted on a live Cpanel Server

I have enabled the text to speech API, Created API KEY in Credentials section, also downloaded the json file of credentials from Create service account key page.

Then I downloaded the sample files from Github and also used composer to build the library

Now I dont understand where to put my keys. At every place, it demangs to EXPORT the key in Shell, but that would work for 1 open command prompt session and will have to be exported every time.

As I want to run this code on a live cpanel based hosting, so I think it wont be possible to export.

Is there any place within the codes where I can pass the key?

On this url article at stackoverflow: the first answer exports the response of CURL to synthesize-text.txt but we require mp3 output

Another answer states that we should use jq but since its a shared hsoting server, I am not sure if we can arrange jq.

Is ther any way out to this problem?


Update

Tried the following code after referring to the answer by @V.Tur

$params = [
    "audioConfig"=>[
        "audioEncoding"=>"MP3",
        "pitch"=> "1",
        "speakingRate"=> "1",
        "effectsProfileId"=> [
            "medium-bluetooth-speaker-class-device"
          ]
    ],
    "input"=>[
        "ssml"=>'<speak>The <say-as interpret-as=\"characters\">SSML</say-as>
                  standard <break time=\"1s\"/>is defined by the
                  <sub alias=\"World Wide Web Consortium\">W3C</sub>.</speak>'
    ],
    "voice"=>[
        "languageCode"=> "hi-IN",
        "name" =>"hi-IN-Wavenet-B",
        'ssmlGender'=>'MALE'
    ]
];
$data_string = json_encode($params);
$speech_api_key = "My_Key_Here";
$url = 'https://texttospeech.googleapis.com/v1/text:synthesize?fields=audioContent&key=' . $speech_api_key;
$handle = curl_init($url);

curl_setopt($handle, CURLOPT_CUSTOMREQUEST, "POST"); 
curl_setopt($handle, CURLOPT_POSTFIELDS, $data_string);  
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_HTTPHEADER, [                                                                          
    'Content-Type: application/json',                                                                                
    'Content-Length: ' . strlen($data_string)
    ]                                                                       
);
$response = curl_exec($handle);              
$responseDecoded = json_decode($response, true);  
curl_close($handle);
if($responseDecoded['audioContent']){
    return $responseDecoded['audioContent'];                
} 

I get the audio downloaded but the pauses/breaks I have mentioned in ssml did not work. I tried passing data to $params as below

$params = "{
    'input':{
     'ssml':'<speak>The <say-as interpret-as=\"characters\">SSML</say-as>
          standard <break time=\"1s\"/>is defined by the
          <sub alias=\"World Wide Web Consortium\">W3C</sub>.</speak>'
    },
    'voice':{
      'languageCode':'en-us',
      'name':'en-US-Standard-B',
      'ssmlGender':'MALE'
    },
    'audioConfig':{
      'audioEncoding':'MP3'
    }
}";

But I get the following error:

Array ( [error] => Array ( [code] => 400 [message] => Invalid JSON payload received. Unknown name "": Root element must be a message. [status] => INVALID_ARGUMENT [details] => Array ( [0] => Array ( [@type] => type.googleapis.com/google.rpc.BadRequest [fieldViolations] => Array ( [0] => Array ( [description] => Invalid JSON payload received. Unknown name "": Root element must be a message. ) ) ) ) ) )

How to solve this?

1

There are 1 answers

9
V.Tur On

Below my working example text-to-speech, you could redo for your needs:

public static function getSound($text)
        {            
            
            $text = trim($text);

            if($text == '') return false;
            
            $params = [
                "audioConfig"=>[
                    "audioEncoding"=>"LINEAR16",
                    "pitch"=> "1",
                    "speakingRate"=> "1",
                    "effectsProfileId"=> [
                        "medium-bluetooth-speaker-class-device"
                      ]
                ],
                "input"=>[
                    "text"=>$text
                ],
                "voice"=>[
                    "languageCode"=> "en-US",
                    "name" =>"en-US-Wavenet-F"
                ]
            ];

            $data_string = json_encode($params);

            $url = 'https://texttospeech.googleapis.com/v1/text:synthesize?fields=audioContent&key=' . $speech_api_key;
            $handle = curl_init($url);
            
            curl_setopt($handle, CURLOPT_CUSTOMREQUEST, "POST"); 
            curl_setopt($handle, CURLOPT_POSTFIELDS, $data_string);  
            curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($handle, CURLOPT_HTTPHEADER, [                                                                          
                'Content-Type: application/json',                                                                                
                'Content-Length: ' . strlen($data_string)
                ]                                                                       
            );
            $response = curl_exec($handle);              
            $responseDecoded = json_decode($response, true);  
            curl_close($handle);
            if($responseDecoded['audioContent']){
                return $responseDecoded['audioContent'];                
            } 

            return false;  
        }

using:
public static function saveSound($text)
   {
      $speech_data = SpeechAPI::getSound($text);//see method upper

      if($speech_data) {                
         $file_name = strtolower(md5(uniqid($text)) . '.mp3');
         $path = FileUpload::getFolder();//just return directory path
         if(file_put_contents($path.$file_name, base64_decode($speech_data))){
             return $file_name;
             }
         }

        return null;
   }

For SSML standart need to change input params:

$text = "<speak>The <say-as interpret-as=\"characters\">SSML</say-as>
            standard <break time=\"1s\"/>is defined by the
            <sub alias=\"World Wide Web Consortium\">W3C</sub>.</speak>";
$params = [
    "audioConfig"=>[
    "audioEncoding"=>"LINEAR16",
    "pitch"=> "1",
    "speakingRate"=> "1",
    "effectsProfileId"=> [
        "medium-bluetooth-speaker-class-device"
       ]
     ],
     "input"=>[
         //"text"=>$text
         "ssml" => $text
          ],
          "voice"=>[
              "languageCode"=> "en-US",
              "name" =>"en-US-Wavenet-F"
            ]
         ];

about choose audioEncoding - https://cloud.google.com/speech-to-text/docs/encoding