from numpy.random import rand
from sklearn.preprocessing import normalize
from scipy.sparse import csr_matrix
from scipy.linalg import norm
w = (rand(1,10)<0.25)*rand(1,10)
x = (rand(1,10)<0.25)*rand(1,10)
w_csr = csr_matrix(w)
x_csr = csr_matrix(x)
(normalize(w_csr,axis=1,copy=False,norm='l2')*normalize(x_csr,axis=1,copy=False,norm='l2')).todense()
norm(w,ord='fro')*norm(x,ord='fro')
I am working with scipy csr_matrix and would like to normalize two matrices using the frobenius norm and get their product. But norm from scipy.linalg and normalize from sklearn.preprocessing seem to evaluate the matrices differently. Since technically in the above two cases I am calculating the same frobenius norm shouldn't the two expressions evaluate to the same thing? But I get the following answer:
matrix([[ 0.962341]])
0.4431811178371029
for sklearn.preprocessing and scipy.linalg.norm respectively. I am really interested to know what I am doing wrong.
sklearn.prepocessing.normalize
divides each row by its norm. It returns a matrix with the same shape as its input.scipy.linalg.norm
returns the norm of the matrix. So your calculations are not equivalent.Note that your code is not correct as it is written. This line
raises
ValueError: dimension mismatch
. The two calls tonormalize
both return matrices with shapes (1, 10), so their dimensions are not compatible for a matrix product. What did you do to getmatrix([[ 0.962341]])
?Here's a simple function to compute the Frobenius norm of a sparse (e.g. CSR or CSC) matrix:
For example,