My goal is to create a website that can stream Audio data from the microphone to the backend for processing and realtime responses (i.e. something like real-time transcription as an example). Currently, my project has a React.js frontend with a Flask backend (all my preprocessing is in python) and I found this great tutorial from medium about this specific task here:
Now, I have accomplished replicating the frontend code. The relevant code for this task would be:
const socketio = io('http://localhost:5000');
##Some other code ##
navigator.getUserMedia({
audio: true
}, function (stream) {
//5)
recordAudio = RecordRTC(stream, {
type: 'audio',
//6)
mimeType: 'audio/webm',
sampleRate: 44100,
// used by StereoAudioRecorder
// the range 22050 to 96000.
// let us force 16khz recording:
desiredSampRate: 16000,
// MediaStreamRecorder, StereoAudioRecorder, WebAssemblyRecorder
// CanvasRecorder, GifRecorder, WhammyRecorder
recorderType: StereoAudioRecorder,
// Dialogflow / STT requires mono audio
numberOfAudioChannels: 1,
timeSlice: 100,
ondataavailable: function (blob) {
// 3
// making use of socket.io-stream for bi-directional
// streaming, create a stream
var stream = ss.createStream();
// stream directly to server
// it will be temp. stored locally
ss(socket).emit('stream', stream, {
name: 'stream.wav',
size: blob.size
});
// pipe the audio blob to the read stream
ss.createBlobReadStream(blob).pipe(stream);
console.log("Sent some data hopefully")
}
});
Now, my Flask backend is capable of getting a connection from the frontend, but it never sees any of the emits from the stream of the audio data. Basically, my goal is to replicate the tutorial from the next part of the tutorial:
which creates an Express server and does some NLP tasks. My goal is to run the stream through google cloud speech-to-text on the Flask backend and emit the transcription results in realtime to the React frontend. I looked, and Google does have a tutorial for both Node.js and Python located here:
Where the python code uses a MicrophoneStream using pyAudio as a stream/ generator and passes it into the google cloud client.
with MicrophoneStream(RATE, CHUNK) as stream:
audio_generator = stream.generator()
requests = (
speech.StreamingRecognizeRequest(audio_content=content)
for content in audio_generator
)
responses = client.streaming_recognize(streaming_config, requests)
# Now, put the transcription responses to use.
listen_print_loop(responses)
My question is how can I have Flask accept the BlobReadStream data from the frontend and create a python generator so that I can input the data into google cloud? One thing that I have thought about is using async or threads to generate a queue of blobs like in the google cloud tutorial while the other thread runs them asynchronously through google cloud.