Trying to build an app with convert real time speech to text

Question

Trying to build an app with convert real time speech to text

257 views Asked by Sandeep pradeep At 12 November 2023 at 08:57

I have been triying to create a real time speech to text using react js as front end and python flask as backend. using socket to connect between these to send real time data i have tried many ways but the data is not correctly converted or no result is printing as output. The python flask recieved the audio data continously as bytes and use pushAudioStream in azure python speech sdk to create a stream of AudioInputStream class and given it to configure conversation_transcriber of azure speech python sdk / speechRecognizer. but the result is not as satisfying please help with a suitable solution

i need output as a text which i have given as input as speech using react js in front end and flask as backend

Original Q&A

There are 1 answers

**Pavan** · Answer 1 · 2023-11-20T13:10:11+00:00

The code below is for speech-to-text applications using React as the frontend, Flask as the backend and Socket. IO.

This sample is for implementation for transcribing audio using Azure Speech to Text.

from flask import Flask, render_template
from flask_socketio import SocketIO
from azure.cognitiveservices.speech import SpeechConfig, ResultReason
from azure.cognitiveservices.speech.audio import AudioConfig, AudioStreamFormat, PullAudioInputStreamCallback
import io
import numpy as np

app = Flask(__name__)
socketio = SocketIO(app)

# Set up your Speech Config
speech_config = SpeechConfig(subscription="AzureSpeechKey", region="AzureSpeechregion")

class StreamBuffer(PullAudioInputStreamCallback):
    def __init__(self, stream):
        super().__init__()
        self.stream = stream
        self.format = AudioStreamFormat(stream.sample_rate, stream.bits_per_sample, stream.channel_count)

    def read(self, buffer_size: int):
        data = self.stream.read(buffer_size)
        return data, len(data)

@socketio.on('audio')
def handle_audio(audio_data):
    audio_stream = io.BytesIO(audio_data)
    stream_buffer = StreamBuffer(audio_stream)

    # Configure your speech_recognizer
    speech_recognizer = speech_config.create_speech_recognizer()
    audio_config = AudioConfig(stream=stream_buffer)
    speech_recognizer.set_audio_config(audio_config)

    # Process audio stream
    result = speech_recognizer.recognize_once()

    # Emit the result back to the frontend
    if result.reason == ResultReason.RecognizedSpeech:
        socketio.emit('transcription', result.text)
    elif result.reason == ResultReason.NoMatch:
        socketio.emit('transcription', "No speech could be recognized")
    elif result.reason == ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        socketio.emit('transcription', "Speech Recognition canceled: {}".format(cancellation_details.reason))

if __name__ == '__main__':
    socketio.run(app, debug=True)


import  React,  {  useState  }  from  'react';
import  axios  from  'axios';
import  Dropzone  from  'react-dropzone';
import  './App.css';
 
const  App = ()  =>  {
const  [transcription,  setTranscription] = useState('');
const  [file,  setFile] = useState(null);
const  onDrop = (acceptedFiles)  =>  {
setFile(acceptedFiles[0]);
};
const  onTranscribe = async  ()  =>  {
const  formData = new  FormData();
formData.append('audio',  file);

try  {
const  response = await  axios.post('http://localhost:5000/api/transcribe',  formData,  {

headers:  {

'Content-Type':  'multipart/form-data',

},
});
setTranscription(response.data.transcription);
}  
catch (error) {

console.error('Error transcribing audio:',  error.message);
}
};

return (
<div  className="App">
<h1>Azure Speech to Text</h1>
<Dropzone  onDrop={onDrop}>

{({  getRootProps,  getInputProps  })  => (

<div  {...getRootProps()}  className="dropzone">

<input  {...getInputProps()}  />

<p>Drag & drop an audio file here, or click to select one</p>

</div>

)}

</Dropzone>

{file && <p>Selected File: {file.name}</p>}

<button  onClick={onTranscribe}  disabled={!file}>

Transcribe

</button>

{transcription && (

<div  className="transcription">

<h2>Transcription:</h2>

<p>{transcription}</p>

</div>

)}

</div>

);

};

export  default  App;

enter image description here

TechQA.

Trying to build an app with convert real time speech to text

There are 1 answers

Related Questions in PYTHON

Related Questions in REACTJS

Related Questions in AZURE

Related Questions in SPEECH-TO-TEXT

Related Questions in AZURE-AI

Popular Questions

Popular Tags

Trending Questions