How to successsfully run an ML algorithm with a medium sized data set on a mediocre laptop?

Question

How to successsfully run an ML algorithm with a medium sized data set on a mediocre laptop?

264 views Asked by Fenil At 04 September 2017 at 06:38

I have a Lenovo IdeaPad laptop with 8 GB RAM and Intel Core I5 processor. I have 60k data points each 100 dimentional. I want to do KNN and for it I am running LMNN algorithm to find a Mahalanobis Metric.
Problem is after 2 hours of running a blank screen appears on my ubuntu. I am not getting what is the problem! Is my memory getting full or something else?
So is there some way to optimize this my code?

My dataset: data
My LMNN implementation:

import numpy as np
import sys
from modshogun import LMNN, RealFeatures, MulticlassLabels
from sklearn.datasets import load_svmlight_file

def main(): 

    # Get training file name from the command line
    traindatafile = sys.argv[1]

    # The training file is in libSVM format
    tr_data = load_svmlight_file(traindatafile);

    Xtr = tr_data[0].toarray(); # Converts sparse matrices to dense
    Ytr = tr_data[1]; # The trainig labels

    # Cast data to Shogun format to work with LMNN
    features = RealFeatures(Xtr.T)
    labels = MulticlassLabels(Ytr.astype(np.float64))



    # Number of target neighbours per example - tune this using validation
    k = 18

    # Initialize the LMNN package
    lmnn = LMNN(features, labels, k)
    init_transform = np.eye(Xtr.shape[1])

    # Choose an appropriate timeout
    lmnn.set_maxiter(200000)
    lmnn.train(init_transform)

    # Let LMNN do its magic and return a linear transformation
    # corresponding to the Mahalanobis metric it has learnt
    L = lmnn.get_linear_transform()
    M = np.matrix(np.dot(L.T, L))

    # Save the model for use in testing phase
    # Warning: do not change this file name
    np.save("model.npy", M) 

if __name__ == '__main__':
    main()

Original Q&A

There are 1 answers

**Jakub Bartczuk** · Answer 1 · 2017-09-04T18:46:43+00:00

Exact k-NN has scalability problems.

Scikit-learn has documentation page (scaling strategies) on what to do in such situation (many algorithms have partial_fit method, butunfortunately kNN doesn't have it).

If you'd accept to trade some accuracy for speed you can run something like approximate nearest neighbors.

TechQA.

How to successsfully run an ML algorithm with a medium sized data set on a mediocre laptop?

There are 1 answers

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in KNN

Related Questions in MAHALANOBIS

Popular Questions

Trending Questions