How to make item based collaborative filtering run faster?

271 views Asked by At

I am trying to find similarity between each pair of items. Items are in a python dictionary and I find the similarity taking pair at a time. The code is -

def allSimilarity(itemsDict, similarityMetric):
    itemList = itemsDict.keys()
    itemSimilarityDict = {}
    for item1 in itemList:
        itemSimilarityDict[item1] = {}
        for item2 in itemList:
            if(item1 == item2):
                continue
            itemSimilarityDict[item1][item2] = similarityMetric(itemsDict, item1, item2)
    return itemSimilarityDict

The problem is that outer loop is taking 5 seconds for each item. I have ~300,000 items so it takes ~18 days for the whole computation. Is there any way to increase the speed? Can I use packages like Theano, Tensorflow and use GPU for this? Or can take a cloud and parallelize the process?

1

There are 1 answers

0
Yao Zhang On BEST ANSWER

I don't think a machine learning library would be particularly helpful here if there is no operations or building blocks readily available for this type of all to all similarity comparison.

I think you'd have better luck by looking at more generic parallelization solutions: OpenMP, TBB, MapReduce, AVX, CUDA, MPI, map reduce, etc.

Also, rewriting the same code in C++ will surely speed things up.