Streaming Audio from React.js to Flask

2.7k views Asked by At

My goal is to create a website that can stream Audio data from the microphone to the backend for processing and realtime responses (i.e. something like real-time transcription as an example). Currently, my project has a React.js frontend with a Flask backend (all my preprocessing is in python) and I found this great tutorial from medium about this specific task here:

https://medium.com/google-cloud/building-a-client-side-web-app-which-streams-audio-from-a-browser-microphone-to-a-server-part-ii-df20ddb47d4e

Now, I have accomplished replicating the frontend code. The relevant code for this task would be:

const socketio = io('http://localhost:5000');
##Some other code ##
navigator.getUserMedia({
  audio: true
}, function (stream) {

  //5)
  recordAudio = RecordRTC(stream, {
    type: 'audio',

    //6)
    mimeType: 'audio/webm',
    sampleRate: 44100,
    // used by StereoAudioRecorder
    // the range 22050 to 96000.
    // let us force 16khz recording:
    desiredSampRate: 16000,

    // MediaStreamRecorder, StereoAudioRecorder, WebAssemblyRecorder
    // CanvasRecorder, GifRecorder, WhammyRecorder
    recorderType: StereoAudioRecorder,
    // Dialogflow / STT requires mono audio
    numberOfAudioChannels: 1,

    timeSlice: 100,

    ondataavailable: function (blob) {

      // 3
      // making use of socket.io-stream for bi-directional
      // streaming, create a stream
      var stream = ss.createStream();
      // stream directly to server
      // it will be temp. stored locally
      ss(socket).emit('stream', stream, {
        name: 'stream.wav',
        size: blob.size
      });
      // pipe the audio blob to the read stream
      ss.createBlobReadStream(blob).pipe(stream);

      console.log("Sent some data hopefully")
    }
  });

Now, my Flask backend is capable of getting a connection from the frontend, but it never sees any of the emits from the stream of the audio data. Basically, my goal is to replicate the tutorial from the next part of the tutorial:

https://medium.com/google-cloud/building-a-web-server-which-receives-a-browser-microphone-stream-and-uses-dialogflow-or-the-speech-62b47499fc71

which creates an Express server and does some NLP tasks. My goal is to run the stream through google cloud speech-to-text on the Flask backend and emit the transcription results in realtime to the React frontend. I looked, and Google does have a tutorial for both Node.js and Python located here:

https://cloud.google.com/speech-to-text/docs/streaming-recognize#speech-streaming-mic-recognize-python

Where the python code uses a MicrophoneStream using pyAudio as a stream/ generator and passes it into the google cloud client.

with MicrophoneStream(RATE, CHUNK) as stream:
    audio_generator = stream.generator()
    requests = (
        speech.StreamingRecognizeRequest(audio_content=content)
        for content in audio_generator
    )

    responses = client.streaming_recognize(streaming_config, requests)

    # Now, put the transcription responses to use.
    listen_print_loop(responses)

My question is how can I have Flask accept the BlobReadStream data from the frontend and create a python generator so that I can input the data into google cloud? One thing that I have thought about is using async or threads to generate a queue of blobs like in the google cloud tutorial while the other thread runs them asynchronously through google cloud.

0

There are 0 answers