I am currently using Scipy 0.7.2 with Numpy 1.4.1. My Python version is 2.6.6. I have written a simple code to read a coo sparse matrix from a .mtx
file as follows:
data = scipy.io.mmread('matrix.mtx')
On running the code, I got the following error:
Traceback (most recent call last):
File "read_mat.py", line 31, in <>
data = scipy.io.mmread('matrix.mtx')
File "/usr/lib64/python2.6/site-packages/scipy/io/mmio.py", line 52, in mmread
return MMFile().read(source)
File "/usr/lib64/python2.6/site-packages/scipy/io/mmio.py", line 273, in read
return self._parse_body(stream)
File "/usr/lib64/python2.6/site-packages/scipy/io/mmio.py", line 417, in _parse_body
flat_data = flat_data.reshape(-1,3)
ValueError: total size of new array must be unchanged
I checked some questions on SO and found that it might be some version specific issues, however, according to this, it has been fixed in my version. Can anybody please tell me what can I do here? Thanks in advance!
EDIT: I tried opening a different file, and it was read. So, I guess the issue is with my file. I am pasting the top few lines of both the files below:
The opened file:
%%MatrixMarket matrix coordinate integer general
%
1466983 1466983 655955608
1 1 3448
1 2 824
1 3 1492
1 4 132
1 5 426
The file which won't open:
%%MatrixMarket matrix coordinate integer general
%
11162 11162 233925
1 2 1
1 3 1
1 4 1
1 16 1
1 19 1
The last few lines of the traceback indicate the likely problem: the data file is read as a flat (1D) array, and then scipy tries to reshape the array to an (n, 3) array, which fails. That means the size of the flat array is not a multiple of three (you'd get the same error if you tried to reshape
np.ones(4).reshape(-1, 3)
).The fact that the flat array is not a multiple array of three means that somewhere on a row, a number is missing. That, or a row (or more rows) is malformed somehow. It may simply be the last row that is cut-off, that would be easy to check.
If you're on *nix, you could for example use
awk
to check:should show lines that don't have 3 columns separated by whitespace.
You could use awk to remove the bad rows as well: