Speech recognition from wav file or from precessed raw audio buffer

Question

Speech recognition from wav file or from precessed raw audio buffer

1.7k views Asked by threewire At 20 November 2017 at 10:25

I am working on an android project where I need to speech to text from audio buffer raw data or from a stored wav-file. Is it possible to do this on android? More specifically I get audio buffers from here

record.read(audioBuffer, 0, audioBuffer.length);

I process the audio buffer and store it as a wave file. I need to convert the processed audio buffer to text or after the audio buffer file has been saved as a wave file can I then convert the wav to text using googles offline speech to text option. Please let me know how do I do this. I have seen other threads here but they are very old. (like 4,6,7 years old....)

Original Q&A

There are 2 answers

**threewire** · Answer 1 · 2017-11-25T15:33:25+00:00

I came across google's could speech API which can take a raw audio file as input and perform asynchronous speech recognition. I have limited app development experience and with java. https://cloud.google.com/speech/docs/async-recognize This link shows how to and here is some elongated source code https://github.com/GoogleCloudPlatform/java-docs-samples/blob/master/speech/cloud-client/src/main/java/com/example/speech/QuickstartSample.java. But problem is when I added the following import statements to my application code in android studio mainactivity.java the get greyed out and some are marked in red.

import com.google.cloud.speech.v1.RecognitionAudio;
import com.google.cloud.speech.v1.RecognitionConfig;
import com.google.cloud.speech.v1.RecognitionConfig.AudioEncoding;
import com.google.cloud.speech.v1.RecognizeResponse;
import com.google.cloud.speech.v1.SpeechClient;
import com.google.cloud.speech.v1.SpeechRecognitionAlternative;
import com.google.cloud.speech.v1.SpeechRecognitionResult;
import com.google.protobuf.ByteString;

import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;

**Homer Wang** · Answer 2 · 2023-11-23T07:03:01+00:00

Since Android 13, SpeechRecognizer can accept file or real time PCM data as input. I managed to write a project to successfully make it work.

At this moment, there is a trick that the sample rate of SpeechRecognizer seem not to work on every rate. For example, I recorded an PCM clip with 22050hz, But if I set EXTRA_AUDIO_SOURCE_SAMPLING_RATE to 22050 the SpeechRecognizer will fail. Change to 16000 and 24000, the same audio clip can be recognized.

Here is how my test project working. I omitted the RECORDING_AUDIO permission part, just turn on the permission in the Android Phone Setting after first crash:

Part 0. Record a PCM raw file of English speech, Linear 16 bits Little Endian, I am using 22050hz sample rate. Put the file at res/raw/test.pcm

Part 1. Create an AndroidStudio project. In manifests, add following at the end of the root tag:

   <manifest xmlns:...
       ...
       <uses-permission android:name="android.permission.INTERNET" />
       <uses-permission android:name="android.permission.RECORD_AUDIO" />
       <queries>
           <intent>
               <action android:name="android.speech.RecognitionService" />
           </intent>
       </queries> 
   </manifest>

Part 2. Add all following code blocks in MainActivity class. i. Variables

   // toggle either function of this sample project
   // 1 for PCM file in res/raw
   // 2 for real time PCM data from AudioRecord
   static final int AUDIO_SOURCE_TYPE = 1; 
   android.speech.SpeechRecognizer speechRecognizer = null;
   ParcelFileDescriptor[] m_audioPipe;
   ParcelFileDescriptor mExtraAudioPFD;
   ParcelFileDescriptor.AutoCloseOutputStream mOutputStream;
   AudioRecord audioRec;
   Thread m_hAutoRecordThread;
   boolean m_bTerminateThread;

ii. Functions for the life cycle of SpeechRecognizer, note: sample rate works on 16000 and 24000, not on 22050, though the original source is recorded with 22050Hz

@RequiresApi(api = Build.VERSION_CODES.TIRAMISU)
private final Intent createSpeechRecognizerIntent() {

    final Intent speechRecognizerIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS, 3000);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_MINIMUM_LENGTH_MILLIS, 6000);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_POSSIBLY_COMPLETE_SILENCE_LENGTH_MILLIS, 2000);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "en-US");

    if (AUDIO_SOURCE_TYPE == 1) {
        speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE, mExtraAudioPFD);
    } else if (AUDIO_SOURCE_TYPE == 2) {
        speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE, m_audioPipe[0]);
    }
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_CHANNEL_COUNT, 1);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_ENCODING, AudioFormat.ENCODING_PCM_16BIT);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_SAMPLING_RATE, 24000); 


    return speechRecognizerIntent;
}

protected void initRecognizer() {

    speechRecognizer = android.speech.SpeechRecognizer.createSpeechRecognizer(this);
    speechRecognizer.setRecognitionListener(new RecognitionListener() {
        @Override public void onReadyForSpeech(Bundle bundle) { Log.i("recognizer", "onReadyForSpeech"); }
        @Override public void onBeginningOfSpeech() { Log.i("recognizer", "onBeginningOfSpeech"); }
        @Override public void onRmsChanged(float v) {
            Log.i("onRmsChanged", "v = " + v);
        }
        @Override public void onBufferReceived(byte[] bytes) { ; }
        @Override public void onEndOfSpeech() {
            Log.i("recognizer", "onEndOfSpeech");
            stopRecognizer();
        }
        @Override public void onError(int i) { Log.i("recognizer", "onError = " + i); }
        @Override public void onResults(Bundle bundle) {

            Log.i("recognizer", "onResults");
            final ArrayList<String> data = bundle.getStringArrayList(android.speech.SpeechRecognizer.RESULTS_RECOGNITION);

            if (data != null && data.size() > 0) {
                String resultData = data.get(0);
                Log.i("SpeechRecogn", "resultData = " + resultData + ", data.get(0) = " + data.get(0));
            }
        }
        @Override public void onPartialResults(Bundle bundle) {

            Log.i("recognizer", "onPartialResults");
            final ArrayList<String> data = bundle.getStringArrayList(android.speech.SpeechRecognizer.RESULTS_RECOGNITION);

            if (data != null && data.size() > 0) {
                String resultData = data.get(0);
                Log.i("SpeechRecogn", "resultData = " + resultData + ", data.get(0) = " + data.get(0));
            }
        }
        @Override public void onEvent(int i, Bundle bundle) { Log.i("recognizer", "onEvent"); }
    });
}

void stopRecognizer() {

    m_bTerminateThread = true;
    new Handler(Looper.getMainLooper()).post(new Runnable() {
        @Override
        public void run() {

            if (speechRecognizer != null) {
                speechRecognizer.stopListening();
                try {
                    if (mOutputStream != null) {
                        mOutputStream.close();
                        mOutputStream = null;
                    }
                } catch (IOException e) {
                    ;
                }
                speechRecognizer = null;
            }
        }
    });
}

iii. AudioRecord Thread, works when you choose real time PCM data

private class RecordingRunnable implements Runnable {

    @Override
    public void run() {
        while (!m_bTerminateThread) {

            short[] readBuf = new short[1024];
            int readLength = audioRec.read(readBuf, 0, readBuf.length);

            byte[] readBytes = ShortArrayToByteArray(readBuf);
            try {
                if (mOutputStream != null) {
                    mOutputStream.write(readBytes, 0, readBytes.length);
                    mOutputStream.flush();
                }
            } catch (IOException e) {
                ;
            }
        }
    }
}

iv. Utility functions

protected byte[] ShortArrayToByteArray(short[] sa) {
    byte[] ret = new byte[sa.length * 2];

    ByteBuffer.wrap(ret).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().put(sa);
    return ret;
}

// function referenced from
// [https://stackoverflow.com/questions/8664468/copying-raw-file-into-sdcard/46244121#46244121][1]
private String copyFiletoStorage(int resourceId, String resourceName){
    String filePath = getFilesDir().getPath() + "/" + resourceName;
    try{
        InputStream in = getResources().openRawResource(resourceId);
        FileOutputStream out = null;
        out = new FileOutputStream(filePath);
        byte[] buff = new byte[1024];
        int read = 0;
        try {
            while ((read = in.read(buff)) > 0) {
                out.write(buff, 0, read);
            }
        } finally {
            in.close();
            out.close();
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return filePath;
}

v. Main function in onStart()

@Override
protected void onStart() {

    super.onStart();

    if (AUDIO_SOURCE_TYPE == 1) {
        try {
            String testFilePath = copyFiletoStorage(R.raw.test, "test.pcm");
            mExtraAudioPFD = ParcelFileDescriptor.open(new File(testFilePath), ParcelFileDescriptor.MODE_READ_ONLY);
        } catch (FileNotFoundException e) {
            mExtraAudioPFD = null;
        }
    } else if (AUDIO_SOURCE_TYPE == 2) {

        try {
            m_audioPipe = ParcelFileDescriptor.createPipe();
        } catch (IOException e) {
            finishAndRemoveTask();
        }

        mOutputStream = new ParcelFileDescriptor.AutoCloseOutputStream(m_audioPipe[1]);
    }

    initRecognizer();

    if (AUDIO_SOURCE_TYPE == 2) {
        try {
            // omitted permission check and request
            // need manually turn on AUDIO RECORDING PERMISSION to run this code
            audioRec = new AudioRecord(MediaRecorder.AudioSource.DEFAULT, 22050, 1, AudioFormat.ENCODING_PCM_16BIT, 524288);
        } catch (IllegalArgumentException e) {
            Log.e("audioRec", "IllegalArgument");
        } catch (SecurityException e) {
            Log.e("audioRec", "SecurityException!");
        } catch (Exception e) {
            Log.e("audioRec", "any Exception");
        }

        m_bTerminateThread = false;

        audioRec.startRecording();
        m_hAutoRecordThread = new Thread(new RecordingRunnable(), "RecordingThread");
        m_hAutoRecordThread.start();
    }

    final Intent speechRecognizerIntent = createSpeechRecognizerIntent();
    speechRecognizer.startListening(speechRecognizerIntent);

    if (AUDIO_SOURCE_TYPE == 2) {
        new Timer().schedule(
                new TimerTask() {

                    @Override
                    public void run() {

                        stopRecognizer();
                    }
                }, 5000);
    }
}

TechQA.

Speech recognition from wav file or from precessed raw audio buffer

There are 2 answers

Related Questions in ANDROID

Related Questions in SPEECH-TO-TEXT

Popular Questions

Trending Questions