Dot product between 1D numpy array and scipy sparse matrix

4.4k views Asked by At

Say I have Numpy array p and a Scipy sparse matrix q such that

>>> p.shape
(10,)
>>> q.shape
(10,100)

I want to do a dot product of p and q. When I try with numpy I get the following:

>>> np.dot(p,q)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist packages/IPython/core/interactiveshell.py", line 2883, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-96-8260c6752ee5>", line 1, in <module>
    np.dot(p,q)
ValueError: Cannot find a common data type.

I see in the Scipy documentation that

As of NumPy 1.7, np.dot is not aware of sparse matrices, therefore using it will result on unexpected results or errors. The corresponding dense matrix should be obtained first instead

But that defeats my purpose of using a sparse matrix. Soooo, how am I to do dot products between a sparse matrix and a 1D numpy array (numpy matrix, I am open to either) without losing the sparsity of my matrix?

I am using Numpy 1.8.2 and Scipy 0.15.1.

3

There are 3 answers

1
user2357112 On BEST ANSWER

Use *:

p * q

Note that * uses matrix-like semantics rather than array-like semantics for sparse matrices, so it computes a matrix product rather than a broadcasted product.

0
hpaulj On

A sparse matrix is not a numpy array or matrix, though most formats use several arrays to store their data. As a general rule, regular numpy functions aren't aware of sparse matrices, so you should count on using the sparse versions of functions and operators.

By popular demand, the latest np.dot is sparse aware, though I don't know the details of how it acts on that. In 1.18 we have several options.

user2357112 suggests p*q. With the dense array first, I was a little doubtful, wondering if it would try to use array element by element multiplication (and fail due to broadcasting errors). But it works. Sometimes operators like * pass control to the 2nd argument. But just to be sure I tried several alternatives:

q.T * p
np.dot(p, q.A)
q.T.dot(p)

all give the same dense (100,) array. Note - this is an array, not a sparse matrix result.

To get a sparse matrix I need to use

sparse.csr_matrix(p)*q   # (1,100) shape

q could be other sparse formats, but for calculations like this it is converted to csr or csc. And .T operation is cheap because if just requires switching the format from csr to csc.

It would be good idea to check whether these alternatives work if p is a 2d array, e.g. (2,10).

1
Aditya On

Scipy has inbuilt methods for sparse matrix multiplication.

Example from documentation:

>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> Q = csr_matrix([[1, 2, 0], [0, 0, 3], [4, 0, 5]])
>>> p = np.array([1, 0, -1])
>>> Q.dot(p)
array([ 1, -3, -1], dtype=int64)

Check these resources:

http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csc_matrix.dot.html http://docs.scipy.org/doc/scipy/reference/sparse.html