Model Random Forest Regressor with scikit-learn and Flask

316 views Asked by At

I 've got a Flask application working in local environnement. But when I run it on production mode it does'nt work.

I'm using pickle to save my model, and I tested joblib to.

The problem occurs when I load the pickle file : I 've got a 504 timeout error. I'm loading the file like this, once the file is genereted by the training : model = pickle.load(open(file)),

I'm preaty sure it's the pickle file genereted by the training that throw this error (I tested with other pickle file)

After more investigation, I seam that injections maid by the pipeline function Pipeline cause the probleme :

model = Pipeline(
        [
            ('features', my_data),
            ('model', ensemble.RandomForestRegressor(min_samples_leaf=1, n_jobs=-1))
        ])
...
pickle.dump(model, file)

this work just fine :

model = Pipeline(
        [
            ('features', my_data),
            ('model', ensemble.RandomForestRegressor(min_samples_leaf=1, n_jobs=-1))
        ])
model = {}
model["foo"] = "bar"
pickle.dump(model, file)

I don't get any trouble with the Flask developement server, only in the production environement (apache), and of course I don't want to use the dev. server on my production env.

Any idea why the 504 error occure in the production environement ?

EDIT : It's method where I used pickle.load(...)

def recup_df():
    df = pd.read_pickle("dataframe.pickle")
    mod = pickle.load("model.pickle")
    X = df.head(20).drop(['price'], axis=1)
    y = df.head(20).price.values.copy()
    predict_df = pd.DataFrame.from_dict({
    'predicted':mod.predict(X),
    'true':y,
    'make':X.make,
    'model':X.model
    })
    prediction = dict()
    result = 1
    for data in predict_df.itertuples():
        str_result = "result n°{}".format(result)
        car_name = "{} {}".format(data.make, data.model)
        prediction[str_result] = {
        car_name : [{
        "true price":data.true,
        "predict price":data.predicted
        }]
        }
        result += 1
    output =  {
        "prediction":prediction
    }
    return jsonify(output)
1

There are 1 answers

8
fpajot On

There is an issue with pickle.dump when it comes to Pipeline objects composed of different transformers.

Here is a previous post regarding the issue with relevant solutions: How to properly pickle sklearn pipeline when using custom transformer

I gave a try to cloudpickle and it worked with skleanr.Pipeline.