I want to establish a system where s3 bucket contains the pickle files of the model and there is an API exposed which loads the latest pickle file in the bento model. Now I want the bento service to use the latest pickle file and start predicting on latest model despite predicting on the old model or old pickle file.
service.py looks like :
bento_model = bentoml.sklearn.get(MODEL_NAME + ":latest")
mail_classification_runner = bentoml.Runner(IrisClassifier,runnable_init_params={
"modelName": MODEL_NAME,
},models=[bento_model])
svm = bentoml.Service("iris_classifier", runners=[mail_classification_runner])
@svm.api(input=JSON(), output=JSON())
def loadNewModel(input: JSON()) -> JSON():
#get the model path which is latest
isDumped = bentoMLHelper.dumpModelInBentoML()
#update to the latest model
global mail_classification_runner
mail_classification_runner.destroy()
bento_model = bentoml.sklearn.get(MODEL_NAME + ":latest")
mail_classification_runner = bentoml.Runner(IrisClassifier,runnable_init_params={
"modelName": MODEL_NAME,
},models=[bento_model])
return {"modelLoaded" : str(isDumped)}
Since I made the runner to be a global variable, i was expecting that whenever someone hits the api for loading new model, it will update the runner with the new model and the service will use this runner now to predict on the upcoming api requests corresponding to the classifier.
But in actual, the model which was loaded at the point of starting the server is used despite calling this loadNewModel. In the bentoML repo the new model has been added and the "latest" file has been updated to the new model but since service is not using the new runner instance, it is always taking the old model to predict the things.