scipy.linalg.norm different from sklearn.preprocessing.normalize?

Question

scipy.linalg.norm different from sklearn.preprocessing.normalize?

1.4k views Asked by anonuser0428 At 05 December 2013 at 05:41

from numpy.random import rand
from sklearn.preprocessing import normalize
from scipy.sparse import csr_matrix
from scipy.linalg import norm

w = (rand(1,10)<0.25)*rand(1,10)
x = (rand(1,10)<0.25)*rand(1,10)
w_csr = csr_matrix(w)
x_csr = csr_matrix(x)
(normalize(w_csr,axis=1,copy=False,norm='l2')*normalize(x_csr,axis=1,copy=False,norm='l2')).todense()

norm(w,ord='fro')*norm(x,ord='fro')

I am working with scipy csr_matrix and would like to normalize two matrices using the frobenius norm and get their product. But norm from scipy.linalg and normalize from sklearn.preprocessing seem to evaluate the matrices differently. Since technically in the above two cases I am calculating the same frobenius norm shouldn't the two expressions evaluate to the same thing? But I get the following answer:

matrix([[ 0.962341]])

0.4431811178371029

for sklearn.preprocessing and scipy.linalg.norm respectively. I am really interested to know what I am doing wrong.

Original Q&A

There are 1 answers

**Warren Weckesser** · Accepted Answer · 2013-12-05T15:33:19+00:00

sklearn.prepocessing.normalize divides each row by its norm. It returns a matrix with the same shape as its input. scipy.linalg.norm returns the norm of the matrix. So your calculations are not equivalent.

Note that your code is not correct as it is written. This line

(normalize(w_csr,axis=1,copy=False,norm='l2')*normalize(x_csr,axis=1,copy=False,norm='l2')).todense()

raises ValueError: dimension mismatch. The two calls to normalize both return matrices with shapes (1, 10), so their dimensions are not compatible for a matrix product. What did you do to get matrix([[ 0.962341]])?

Here's a simple function to compute the Frobenius norm of a sparse (e.g. CSR or CSC) matrix:

def spnorm(a):
    return np.sqrt(((a.data**2).sum()))

For example,

In [182]: b_csr
Out[182]: 
<3x5 sparse matrix of type '<type 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>

In [183]: b_csr.A
Out[183]: 
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  2.,  0.,  4.,  0.],
       [ 0.,  0.,  0.,  2.,  1.]])

In [184]: spnorm(b_csr)
Out[184]: 5.0990195135927845

In [185]: norm(b_csr.A)
Out[185]: 5.0990195135927845

TechQA.

scipy.linalg.norm different from sklearn.preprocessing.normalize?

There are 1 answers

Related Questions in PYTHON

Related Questions in SCIPY

Related Questions in SPARSE-MATRIX

Related Questions in NORM

Related Questions in SCIKIT-LEARN

Popular Questions

Popular Tags

Trending Questions