How to download a single audio file from HuggingFace datasets?

398 views Asked by At

I am trying to download audio files from the HuggingFace dataset using Google Colab as follows. But, I am getting the following error.

pip install datasets
from datasets import DatasetDict
from collections import defaultdict
from datasets import load_dataset
ds = load_dataset('imvladikon/hebrew_speech_kan')
a = ds['train'][0]['audio']['path']
print(a)

from huggingface_hub import hf_hub_download
audio_file_url = '/root/.cache/huggingface/datasets/downloads/extracted/8ce7402f6482c6053251d7f3000eec88668c994beb48b7ca7352e77ef810a0b6/train/e429593fede945c185897e378a5839f4198.wav'
hf_hub_download(audio_file_url)

Error:

---------------------------------------------------------------------------
HFValidationError                         Traceback (most recent call last)
<ipython-input-36-6fb2d1a885ee> in <cell line: 3>()
      1 from huggingface_hub import hf_hub_download
      2 audio_file_url = '/root/.cache/huggingface/datasets/downloads/extracted/8ce7402f6482c6053251d7f3000eec88668c994beb48b7ca7352e77ef810a0b6/train/e429593fede945c185897e378a5839f4198.wav'
----> 3 hf_hub_download(audio_file_url)

1 frames
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py in validate_repo_id(repo_id)
    156 
    157     if repo_id.count("/") > 1:
--> 158         raise HFValidationError(
    159             "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
    160             f" '{repo_id}'. Use `repo_type` argument if needed."

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/root/.cache/huggingface/datasets/downloads/extracted/8ce7402f6482c6053251d7f3000eec88668c994beb48b7ca7352e77ef810a0b6/train/e429593fede945c185897e378a5839f4198.wav'. Use `repo_type` argument if needed.

Using Repo id and filename

from huggingface_hub import hf_hub_url
hf_hub_url(
    repo_id="imvladikon/hebrew_speech_kan", filename="e429593fede945c185897e378a5839f4198.wav"
)

This outputs the url https://huggingface.co/imvladikon/hebrew_speech_kan/resolve/main/e429593fede945c185897e378a5839f4198.wav. However, the HuggingFace website returns that this repository is not available.

Any help appreciated in advance.

1

There are 1 answers

1
Jatin Sehrawat On

The given function needs a repo_id and filename to run, so try this:

hf_hub_download(repo_id="imvladikon/hebrew_speech_kan",filename="e429593fede945c185897e378a5839f4198.wav")