How to resample from 8K to 16K with librosa or torchaudio as ffmpeg do it?

Question

How to resample from 8K to 16K with librosa or torchaudio as ffmpeg do it?

360 views Asked by user3668129 At 07 November 2023 at 12:12

In my app,

I'm getting array of audio sample (with sample rate =8000) which was loaded with torchaudio.load
I need to use this audio array and run whisper (STT).
I want to avoid from loading the wav file again with whisper (load_audio) (for efficiency) and to resample the array to 16000.
whisper.load_audio use ffmpeg to load and resample the audio to 16000. I'm trying to use librosa or torchaudio and resample the audio array but It always seems that the resample methods are not the same.
(I assume that if I use other resample method not as the whisper model was trained on, I can get bad results).

Example: loading test.wav file (with SR=8000) and print the 5 first cells: whisper_audio = whisper.load_audio(file) => [-0.00082397 -0.00115967 -0.00186157 -0.00231934 -0.00222778, ...]

loading with torchaudio and resample it with librosa: librosa.resample(vad_audio, orig_sr=8000, target_sr=16000, scale=True, res_type='kaiser_best') => [-0.00082317 -0.0010577 -0.0013937 -0.0016688 -0.00186235

seems different values.

How can I resample the audio in the exact way ffmpeg do it ?

Original Q&A

There are 1 answers

**moto** · Answer 1 · 2023-12-28T07:39:31+00:00

You can use torchaudio.io.StreamReader to load and resample audio. This functionality is implemented with ffmpeg, so you might be able to produce the same waveform.

When you use the add_basic_audio_stream method with sample_rate option, it will use FFmpeg's filter function to apply resampling.

https://pytorch.org/audio/2.1.1/generated/torchaudio.io.StreamReader.html#add-basic-audio-stream

If the ffmpeg command is using non-default re-sampling method, you need to construct the same filter description and pass it to add_audio_stream method.

https://pytorch.org/audio/2.1.1/generated/torchaudio.io.StreamReader.html#add-audio-stream

TechQA.

How to resample from 8K to 16K with librosa or torchaudio as ffmpeg do it?

There are 1 answers

Related Questions in FFMPEG

Related Questions in SCIPY

Related Questions in LIBROSA

Related Questions in OPENAI-WHISPER

Related Questions in TORCHAUDIO

Popular Questions

Popular Tags

Trending Questions