I 've got a Flask application working in local environnement. But when I run it on production mode it does'nt work.
I'm using pickle to save my model, and I tested joblib to.
The problem occurs when I load the pickle file : I 've got a 504 timeout error.
I'm loading the file like this, once the file is genereted by the training :
model = pickle.load(open(file))
,
I'm preaty sure it's the pickle file genereted by the training that throw this error (I tested with other pickle file)
After more investigation, I seam that injections maid by the pipeline function Pipeline cause the probleme :
model = Pipeline(
[
('features', my_data),
('model', ensemble.RandomForestRegressor(min_samples_leaf=1, n_jobs=-1))
])
...
pickle.dump(model, file)
this work just fine :
model = Pipeline(
[
('features', my_data),
('model', ensemble.RandomForestRegressor(min_samples_leaf=1, n_jobs=-1))
])
model = {}
model["foo"] = "bar"
pickle.dump(model, file)
I don't get any trouble with the Flask developement server, only in the production environement (apache), and of course I don't want to use the dev. server on my production env.
Any idea why the 504 error occure in the production environement ?
EDIT : It's method where I used pickle.load(...)
def recup_df():
df = pd.read_pickle("dataframe.pickle")
mod = pickle.load("model.pickle")
X = df.head(20).drop(['price'], axis=1)
y = df.head(20).price.values.copy()
predict_df = pd.DataFrame.from_dict({
'predicted':mod.predict(X),
'true':y,
'make':X.make,
'model':X.model
})
prediction = dict()
result = 1
for data in predict_df.itertuples():
str_result = "result n°{}".format(result)
car_name = "{} {}".format(data.make, data.model)
prediction[str_result] = {
car_name : [{
"true price":data.true,
"predict price":data.predicted
}]
}
result += 1
output = {
"prediction":prediction
}
return jsonify(output)
There is an issue with pickle.dump when it comes to Pipeline objects composed of different transformers.
Here is a previous post regarding the issue with relevant solutions: How to properly pickle sklearn pipeline when using custom transformer
I gave a try to cloudpickle and it worked with skleanr.Pipeline.