Why can't I download a dataset with the Gensim download API

4.9k views Asked by At

When I do the below:

>>> import gensim.downloader as api
>>> model = api.load("glove-twitter-25")  # load glove vectors

the gensim.downloader API throws the below error:

[Errno 2] No such file or directory: '/Users/vtim/gensim-data/information.json'.

What am I doing wrong?

3

There are 3 answers

0
Dilip H On

I had both the issues 'information.json' related as well as the certificate one and was able to resolve it by following the steps above. As a tip you can also try testing it in command line by doing

python3 -m gensim.downloader -i word2vec-google-news-300

replace word2vec-google-news-300 with the dataset that you want to download in https://github.com/RaRe-Technologies/gensim-data/blob/master/list.json

0
Hans On

for me all it took (on the mac) was executing:

bash /Applications/Python*/Install\ Certificates.command

after that it started downloading without error

3
tommi123 On

I had the same problem and I solved it in these steps. I am using mac, pycharm, and virtualenv. I don't have too much python experience but this is how I did it:

1.1 You have to create a folder named 'gensim-data' with directory '/Users/vtim/gensim-data'. This can be done by running command 'mkdir gensim-data' in your terminal (the same place where you can use pip install commands).

1.2 Then you have to add the folder to your project as a content root (so that the code can access it). From Pycharm go from the main application menu (next to Apple logo with mac) Pycharm -> Preferences and there Project -> Project Structure and from there on the right menu choose 'Add content root'. Find the gensim-data folder that you just made and add it.

1.3 Now you should see the 'gensim-data' folder in your project folder where, for example, venv (virtualenv) is also if you are using it. Now create a file to the 'gensim-data' folder named as 'information.json'. Then copy the code found from this link to the 'information.json' file: https://github.com/RaRe-Technologies/gensim-data/blob/master/list.json

(The problem that you have is that gensim.downloader api may not have access to write documents to the specific directory or it can not read them. In my case it couldn't do either.)

If your code is still not working, you should do the next step:

2.1 In my case I had also a problem that the api could not access files the right files from internet. This problem is solved here: https://stackoverflow.com/a/42098127/14075343 . So find the folder/application named Python 3.8 (if you are using 3.8 version) from your computer, open it and double click 'Install Certificates.command'. Or you can try to run from terminal 'open /Applications/Python\ 3.8/Install\ Certificates.command'

Now the code should work. If it still doesn't you can try to run these codes. I am not sure if it makes a difference but I run these on the way I found the solution:

sudo python3 -m pip install --upgrade gensim

sudo -H pip install virtualenv

sudo chown -R $USERNAME /Users/$USERNAME/Library/Caches/pip