Every night we train a new model, and deploy it (in Flaks) to an existing endpoint. The Flask code is (simplified) as:
from flask import Flask, request
def load_models()
"""
Load new model from google cloud storage
"""
.
.
return model
app = Flask(__name__)
app.json.ensure_ascii = False
MODELS_ARE_LOADED = False
model = load_models()
MODELS_ARE_LOADED = True
@app.route('/predict', methods=["POST"])
def predict():
data = request.get_data()
predictions = model.predict(data)
return {"predictions":predictions}
@app.route('/health', methods=["GET"])
def health():
if not MODELS_ARE_LOADED:
raise BadRequest("Models are not loaded yet")
return "OK"
which works fine. The issue is that it seems like the old model is removed before the new model is ready, leading to the endpoint being unavailable for a few minutes.
The check MODELS_ARE_LOADED
seems to work locally i.e returns the error message when the models are not loaded, and as far as I understand the endpoint is not considered "ready" before it's healthy.
I would assume that adding a new model the traffic wouldn't go to the new model before it is healthy, or am I wrong here?