Building Pipelines

111 views Asked by At

I've been recently trying to set up a Pipeline to produce a Machine Learning model. I have built my own data preprocessing classes and a new class with an optimized sklearn algorithm: Regresor_Model; however when I declare the pipeline steps, for example:

from source.preprocessing_functions import Change_Data_Type, Years_Passed, Duplicated_Data 
from source.preprocessing_functions import One_Hot_Encoding_Train, Standard_Scaling_Train, Reduce_Memory_Usage
from source.machine_learning_toolbox import Regresor_Model
from sklearn.pipeline import Pipeline 

# Loading the data
# ================
data = lp.load_data(config.DATA,config.ID_VAR)
X, y = data.drop(config.TARGET,axis=1), data[config.TARGET]
X = X[config.PREDICTORS]

# Train-Test Split
# ================
X_train, X_tests, y_train, y_tests = train_test_split(X, y, test_size=0.3, random_state=config.SEED)


# Defining the Pipeline steps
# ===========================
steps = [('to_float', Change_Data_Type('Kms_Driven','Float')), ('years_passed', Years_Passed('Year',config.YEAR)),
         ('duplicates', Duplicated_Data()), ('one_hot_train', One_Hot_Encoding_Train(config.CATEGORICAL,drop_first=False)),
         ('scale_train', Standard_Scaling_Train(config.NUMERICAL)), 
         ('reduce_memory', Reduce_Memory_Usage()), ('model', Regresor_Model(config.BOUNDS))]

# Producing the pipeline
# ======================
pipeline = Pipeline(steps)
pipeline.fit(X_train, y_train)

and start running the script, I get an error message that it cannot find the module sklearn.preprocessing_functions

preprocessing_functions and machine_learning_toolbox are two scripts where I have stored the preprocessing classes and the optimized machine learning algorithm. In the literature, I have seen they use sklearn.Pipeline with pure sklearn libraries such as estimators = [('reduce_dim', PCA()), ('clf', SVC())]

Is there a walk-around method to create a pipeline using our own preprocessing tools and thus building the pipeline using sklearn?

0

There are 0 answers