NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported

18.3k views Asked by At

I try to load a dataset using the datasets python module in my local Python Notebook. I am running a Python 3.10.13 kernel as I do for my virtual environment.

I cannot load the datasets I am following from a tutorial. Here's the error:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
/Users/ari/Downloads/00-fine-tuning.ipynb Celda 2 line 3
      1 from datasets import load_dataset
----> 3 data = load_dataset(
      4     "jamescalam/agent-conversations-retrieval-tool",
      5     split="train"
      6 )
      7 data

File ~/Documents/fastapi_language_tutor/env/lib/python3.10/site-packages/datasets/load.py:2149, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
   2145 # Build dataset for splits
   2146 keep_in_memory = (
   2147     keep_in_memory if keep_in_memory is not None else is_small_dataset(builder_instance.info.dataset_size)
   2148 )
-> 2149 ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
   2150 # Rename and cast features to match task schema
   2151 if task is not None:
   2152     # To avoid issuing the same warning twice

File ~/Documents/fastapi_language_tutor/env/lib/python3.10/site-packages/datasets/builder.py:1173, in DatasetBuilder.as_dataset(self, split, run_post_process, verification_mode, ignore_verifications, in_memory)
   1171 is_local = not is_remote_filesystem(self._fs)
   1172 if not is_local:
-> 1173     raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
   1174 if not os.path.exists(self._output_dir):
   1175     raise FileNotFoundError(
   1176         f"Dataset {self.dataset_name}: could not find data in {self._output_dir}. Please make sure to call "
   1177         "builder.download_and_prepare(), or use "
   1178         "datasets.load_dataset() before trying to access the Dataset object."
   1179     )

NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.

How do I resolve this? I don't understand how this error is applicable, given that the dataset is something I am fetching and thus cannot be cached in my LocalFileSystem in the first place.

2

There are 2 answers

3
Talha Tayyab On BEST ANSWER

Try doing:

pip install -U datasets

This error stems from a breaking change in fsspec. It has been fixed in the latest datasets release (2.14.6). Updating the installation with pip install -U datasets should fix the issue.

git link : https://github.com/huggingface/datasets/issues/6352


If you are using fsspec, then do:

pip install fsspec==2023.9.2

There is a problem with fsspec==2023.10.0

git link : https://github.com/huggingface/datasets/issues/6330



Edit: Looks like it broken again in 2.17 and 2.18 downgrading to 2.16 should work.

0
lisimba8 On

I managed to get round it by deleting the files from the cached hugging face datasets folder. This is not the best way to go about solving this but it managed to work afterwards. Do bare in mind that I was only using datasets for one dataset though, so it didn't affect anything else.