I'm not very skilled in Python. However, I'm pretty handy with R. Yet, I do have to use Python since it has an up-to-date interface with Cplex. I'm also trying to avoid all the extra coding I would have to do in C/C++
That being said, I have issues with speed and efficiency on large lists/arrays/matrices...
I quickly wrote two lines in R, which are pretty ugly but work somewhat well...
doses = sapply(organSets[[2]], function(x) sum(voxMap_beamlet_val[which(voxMap_beamlet_iInd[,1] == x),1]))
length(which(doses <= sum(doses)/length(organSets[[2]])))/length(doses)
... these lines execute in about 5 minutes for length(organSets[[2]])=52960
and length(voxMap_beamlet_val) = length(voxMap_beamlet_iInd) = 1217077
However, in Python, something similar takes about two hours. By using scipy.sparse, the execution time gets reduced by half, to about an hour... which is still obviously unacceptable...
Python code as follows...
voxMapBeamlet = sparse.coo_matrix((voxMap_beamlet_val,(voxMap_beamlet_iInd,voxMap_beamlet_jInd)),shape=(1055736,8500))
def getDose(voxInd):
return sum([x.sum() for x in voxMapBeamlet.getrow(voxInd)])
def probDoseLevel2(orgVox,level=None):
voxDose2 = [0] * len(orgVox)
for i,v in enumerate(orgVox):
voxDose2[i] = getDose(v)
if level == None:
level = sum(voxDose2)/len(orgVox)
print len([x for x in voxDose2 if x <= level])/len(orgVox)
probDoseLevel2(organSets[1])
Please advise.
First, you want efficient access to rows, and COO format doesn't do that. Convert your
voxMapBeamlet
to Compressed Sparse Row format, andgetrow
becomes much more efficient:Second,
getDose
is a lot more complicated than it needs to be:I suspect this will already be fast enough, but we should be able to push it further. We can push more of the work out of Python bytecode and into C-level routines using broadcasting and advanced indexing:
Finally, if you're on Python 2, the division in your last line is integer division. That'll always produce 0 or 1 here, since
len(orgVox)
is at least as big as the numerator. There may be a similar problem with the division to producelevel
, depending on the dtype of the data you're working with. To turn on true division, you can putat the top of your file.