I am trying to wok with a audio-text pair dataset from huggingface (https://huggingface.co/datasets/MLCommons/peoples_speech). Since the dataset is large, I wish to stream it and use it as an iterable.
dataset = load_dataset("MLCommons/peoples_speech", split='train', streaming=True)
dataset = dataset.take(10)
The dataset is an iterable with elements as dictionary as follows:
{'id': '07282016HFUUforum_SLASH_07-28-2016_HFUUforum_DOT_mp3_00000.flac', 'audio': {'path': '07282016HFUUforum_SLASH_07-28-2016_HFUUforum_DOT_mp3_00000.flac', 'array': array([ 0.14205933, 0.20620728, 0.27151489, ..., 0.00402832,
-0.00628662, -0.01422119]), 'sampling_rate': 16000}, 'duration_ms': 14920, 'text': "i wanted this to share a few things but i'm going to not share as much as i wanted to share because we are starting late i'd like to get this thing going so we all get home at a decent hour this this election is very important to"}
I can get the text with the key ['text']; but I am not sure how to get the audio file? There is a path within the 'audio' key ; but I don't know how to use this path. Is there any way I can download and save the audio file and then later use it in my python script. I wish to give this .flac file to an audio encoder after converting it into .wav format.