I wrote a program and have been profiling it. The bottlenecks are the following (if I use a sparse matrix):
26534 0.775 0.000 66.657 0.003 compressed.py:638(__setitem__)
26534 2.240 0.000 59.438 0.002 compressed.py:688(_set_many)
13318 2.993 0.000 50.024 0.004 compressed.py:742(_insert_many)
3034231 23.087 0.000 38.101 0.000 defmatrix.py:312(__getitem__)
If I use a dense matrix, then these operations are slow (have to init matrix to zeros)
3034072 24.902 0.000 41.135 0.000 defmatrix.py:312(__getitem__)
11780 19.586 0.002 19.586 0.002 {numpy.core.multiarray.zeros}
The sparse matrix version is faster (193s versus 178s). But retrieving and putting in rows is clearly a bottleneck for me. I tried using the take
function, where I use range()
to create an array containing the indices of the row. However that is much worse (by a factor of 10000) than what I am currently doing, which is for matrix X, X[idx,:]
for putting and X.getrow(idx).todense()
for taking.
Is there a better way to access and replace these rows? My matrices are quite large (~100000 rows 20-500 cols).
Edit: I am using csr_matrix (but open to any kind of sparse matrix - this one seemed suited for grabbing rows). Below is a series of tests just to give a MWE. The speeds are about 3E-4s, 7E-3s, .1s. This is surprising to me and I wonder if there is a faster way to do it than the top approach. If I remove the todense() call the sparse time gets cut in half - but this still seems pretty slow.
import numpy as np
from time import time
from scipy.sparse import csr_matrix
def testFancy(mat,idxs):
for i in idxs:
x = mat[i,:]
def testTake(mat,idxs):
for i in idxs:
x = mat.take(range(i*50,i*50+50))
def testSparse(mat,idxs):
for i in idxs:
x = mat.getrow(i).todense()
mat = np.random.rand(50000,50)
idxs = np.random.randint(50000-1, size=1000)
#fancy
t0 = time()
testFancy(mat,idxs)
t1 = time()
print str(t1-t0)
#take
t0 = time()
testTake(mat,idxs)
t1 = time()
print str(t1-t0)
#sparse
mat = csr_matrix((50000,50))
t0 = time()
testSparse(mat,idxs)
t1 = time()
print str(t1-t0)
I tried using a dok_mat:rix. The result on my test is 0.03s and the overall application is 66s with:
This seems to be a nice compromise - but I do wonder if it could be better.