Why smogn is extremely slow?

627 views Asked by At

I am using smoter for balancing my data for regression. I have 130k samples, 3 feature columns, and 1 target column. Smoter is taking ages to balance the data. e.g. with smote from learning for classification, it took seconds. Am I doing something wrong or it's just the size of the data? The estimated time by the smoter is around 20h to balance all the data. I also checked how would it be for e.g. 20 % of the data so 13k samples, estimated time was around 2h...

import smogn
smogn.smoter(
    
    ## main arguments
    data = df_gonzalez_healthy,           ## pandas dataframe
    y = 'healthy',          ## string ('header name')
    k = 9,                    ## positive integer (k < n)
    samp_method = 'extreme',  ## string ('balance' or 'extreme')

    ## phi relevance arguments
    rel_thres = 0.80,         ## positive real number (0 < R < 1)
    rel_method = 'auto',      ## string ('auto' or 'manual')
    rel_xtrm_type = 'high',   ## string ('low' or 'both' or 'high')
    rel_coef = 2.25           ## positive real number (0 < R)
)
1

There are 1 answers

0
Radhakrishna S P On

I don't think you're doing anything wrong, it's actually the case with many of the users.

It's probably because of a lot of for loops.

Author/developer has already said he's working on making smogn more efficient.