Scipy ValueError: Total size of new array must be unchanged

6.1k views Asked by At

I am currently using Scipy 0.7.2 with Numpy 1.4.1. My Python version is 2.6.6. I have written a simple code to read a coo sparse matrix from a .mtx file as follows:

data = scipy.io.mmread('matrix.mtx')

On running the code, I got the following error:

Traceback (most recent call last):                                    
  File "read_mat.py", line 31, in <>
    data = scipy.io.mmread('matrix.mtx')                                                                                         
  File "/usr/lib64/python2.6/site-packages/scipy/io/mmio.py", line 52, in mmread                         
    return MMFile().read(source)                                                                                        
  File "/usr/lib64/python2.6/site-packages/scipy/io/mmio.py", line 273, in read                          
    return self._parse_body(stream)                                                                                     
  File "/usr/lib64/python2.6/site-packages/scipy/io/mmio.py", line 417, in _parse_body                   
    flat_data = flat_data.reshape(-1,3)                                                                                 
ValueError: total size of new array must be unchanged  

I checked some questions on SO and found that it might be some version specific issues, however, according to this, it has been fixed in my version. Can anybody please tell me what can I do here? Thanks in advance!

EDIT: I tried opening a different file, and it was read. So, I guess the issue is with my file. I am pasting the top few lines of both the files below:

The opened file:

%%MatrixMarket matrix coordinate integer general
%
1466983 1466983 655955608
1 1 3448
1 2 824
1 3 1492
1 4 132
1 5 426

The file which won't open:

%%MatrixMarket matrix coordinate integer general
%
11162 11162 233925
1 2 1
1 3 1
1 4 1
1 16 1
1 19 1
1

There are 1 answers

2
AudioBubble On BEST ANSWER

The last few lines of the traceback indicate the likely problem: the data file is read as a flat (1D) array, and then scipy tries to reshape the array to an (n, 3) array, which fails. That means the size of the flat array is not a multiple of three (you'd get the same error if you tried to reshape np.ones(4).reshape(-1, 3)).

The fact that the flat array is not a multiple array of three means that somewhere on a row, a number is missing. That, or a row (or more rows) is malformed somehow. It may simply be the last row that is cut-off, that would be easy to check.

If you're on *nix, you could for example use awk to check:

awk '{ print NF }' matrix.mtx | grep -v 3

should show lines that don't have 3 columns separated by whitespace.

You could use awk to remove the bad rows as well:

awk '(NF == 3 || NR < 3) { print $0 }' matrix.mtx > goodmatrix.mtx