I am working on an android project where I need to speech to text from audio buffer raw data or from a stored wav-file. Is it possible to do this on android? More specifically I get audio buffers from here
record.read(audioBuffer, 0, audioBuffer.length);
I process the audio buffer and store it as a wave file. I need to convert the processed audio buffer to text or after the audio buffer file has been saved as a wave file can I then convert the wav to text using googles offline speech to text option. Please let me know how do I do this. I have seen other threads here but they are very old. (like 4,6,7 years old....)
Since Android 13, SpeechRecognizer can accept file or real time PCM data as input. I managed to write a project to successfully make it work.
At this moment, there is a trick that the sample rate of SpeechRecognizer seem not to work on every rate. For example, I recorded an PCM clip with 22050hz, But if I set EXTRA_AUDIO_SOURCE_SAMPLING_RATE to 22050 the SpeechRecognizer will fail. Change to 16000 and 24000, the same audio clip can be recognized.
Here is how my test project working. I omitted the RECORDING_AUDIO permission part, just turn on the permission in the Android Phone Setting after first crash:
Part 0. Record a PCM raw file of English speech, Linear 16 bits Little Endian, I am using 22050hz sample rate. Put the file at res/raw/test.pcm
Part 1. Create an AndroidStudio project. In manifests, add following at the end of the root tag:
Part 2. Add all following code blocks in MainActivity class. i. Variables
ii. Functions for the life cycle of SpeechRecognizer, note: sample rate works on 16000 and 24000, not on 22050, though the original source is recorded with 22050Hz
iii. AudioRecord Thread, works when you choose real time PCM data
iv. Utility functions
v. Main function in onStart()