Python: working with huge vectors

64 views Asked by At

I am still learning how to program in python and I have a problem regarding operations on big arrays. I have a huge array M x N where M = 3 500 000 and N = 17 000 000 and for each vector in range(0, M) I need to iterate through all values from (0, N) and do something. My problem is that this takes very very long. I think it took somewhere between 5 and 10 minutes to process only the first item.

Could you please let me know how can I speed things up?

sizeV = 17 000 000

def most_similar(i, n=10):
    sim_list = []
    for k in range(0, sizeV):
        result_b = spatial.distance.correlation(vect[i], vect[k])
        sim_list.append(tuple((k, result_b)))
    L = sorted(sim_list, key = itemgetter(1), reverse=True)
    return L[:topn]
1

There are 1 answers

0
Alexander Davydov On

Just use numpy. It's designed to deal with large arrays and matrices and provides great speed/space optimizations.