Streaming audio data from a huggingface dataset, or emulate streaming given the sample data

97 views Asked by At

I would like to access various datasets from huggingface.co that contain audio data. To begin, I am using the GigaSpeech dataset.

I understand how to use an IterableDataset (by including streaming=True when calling load_dataset(...). However, this appears to download the entire audio file at once, as the returned item has an key audio whose value has keys path and array, where array appears to contain the sample data for the entire audio file.

I am using torchaudio.io.StreamReader, which appears to support streaming from a URL (i.e. from a remote file). I am wondering if it might be possible to have the IterableDataset (or something like it) iterate over the URLs to the audio files rather than downloading them directly.

If this is not possible: I've looked in the cache folder several times and I can't find the audio file or even the folder that path seems to allude to. At any rate, since array seems to contain the audio data from the file, reading the source file itself appears unnecessary. However, torchaudio.io.StreamReader does not seem to support "streaming" from an array. I would like to know what the best method is to easily perform "streaming" with possible resampling over the array (whose dtype is torch.float64, but will need to be converted to numpy.float32 at some point).

Obviously, I could implement my own windowing and resampling on the array, but it would be much better if I could use something pre-existing that works out of the box very similarly to the StreamReader.

0

There are 0 answers