Memory error while importing IMDb files using IMDbPY script

328 views Asked by At

While importing the IMDb files into MySQL 5 using MyISAM storage engine I am getting the following memory error:

Traceback (most recent call last):
  File "/usr/local/bin/imdbpy2sql.py", line 3072, in <module>
    run()
  File "/usr/local/bin/imdbpy2sql.py", line 2937, in run
    readMovieList()
  File "/usr/local/bin/imdbpy2sql.py", line 1531, in readMovieList
    mid = CACHE_MID.addUnique(title, yearData)
  File "/usr/local/bin/imdbpy2sql.py", line 1135, in addUnique
    else: return self.add(key, miscData)
  File "/usr/local/bin/imdbpy2sql.py", line 1010, in add
    self[key] = c
  File "/usr/local/bin/imdbpy2sql.py", line 922, in __setitem__
    dict.__setitem__(self, key, counter)
MemoryError

This is on Ubuntu 14.0.4 an EC2 instance on AWS with 1GB of memory. I first tried using this command:

imdbpy2sql.py --mysql-force-myisam -d /home/ubuntu/imdb-files/ -u mysql://admin:password@localhost/imdb

And also:

imdbpy2sql.py --mysql-force-myisam -d /home/ubuntu/imdb-files/ -u mysql://admin:password@localhost/imdb -c /home/ubuntu/imdb-files/csv

Both failed with the same memory error. Anyone know of a workaround?

UPDATE (6/20/2015):

It always produces this memory error at the same point. Here is the MySQL table status of the title table it is populating.

| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment

| title | MyISAM | 10 | Dynamic | 2699999 | 83 | 226543136 | 281474976710655 | 32410624 | 0 | 2700000

And the memory usage of imdbpy2sql.py is around 62%. I am not a Python pseron so not sure how to debug it so any input would be greatly appreciated.

1

There are 1 answers

1
Davide Alberani On BEST ANSWER

I fear that 1 GB of total RAM is not enough to run imdbpy2sql.py safely. Maybe you can try on a different instance or adding a swap file to your system (it will obviously slow everything down).