how can I use estimators not in sklearn for model pipeline

1.5k views Asked by At

I tried to use arima model in the gridSearchCV function, but it returns

"TypeError: Cannot clone object '' (type ): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods. "

import numpy as np
import pandas as pd
from sklearn.grid_search import GridSearchCV
from statsmodels.tsa.arima_model import ARIMA
df_original = pd.DataFrame({"date_col": ['2016-08-01', '2016-08-02', '2016-08-03', '2016-08-04', '2016-08-05',
                                             '2016-08-06', '2016-08-07', '2016-08-08', '2016-08-09', '2016-08-10',
                                             '2016-08-11'],
                                'sum_base_revenue_cip': [1, 2, 7, 5, 1, 2, 5, 10, 9, 0, 1]})
    df_original["sum_base_revenue_cip"] = np.log(df_original["sum_base_revenue_cip"] + 1e-6)
    df_original_ts = df_original.copy(deep=True)
    df_original_ts['date_col'] = pd.to_datetime(df_original['date_col'])
    df_original_ts = df_original_ts.set_index('date_col')
    print df_original_ts

    estimator = ARIMA(df_original_ts,order=(1,1,0))
    params = {
        'order': ((2, 1, 0), (0, 2, 1), (1, 0, 0))
    }
    grid_search = GridSearchCV(estimator,
                               params,
                               n_jobs=-1,
                               verbose=True)
    grid_search.fit(df_original_ts)
1

There are 1 answers

0
simon On
  1. You can find an sklearn wrapper for it
  2. You can write your own inheriting from BaseEstimator and meeting all the requirements for an sklearn estimator e.g. all parameters have to be explicitly mentioned in the signature for init.
  3. You can roll your own gridsearch just looping through the parameters.