Google vertex endpoint is unavilable when deploying new model

103 views Asked by At

Every night we train a new model, and deploy it (in Flaks) to an existing endpoint. The Flask code is (simplified) as:

from flask import Flask, request

def load_models()
   """
    Load new model from google cloud storage
   """
   .
   .
   return model
app = Flask(__name__)
app.json.ensure_ascii = False
MODELS_ARE_LOADED = False
model = load_models()
MODELS_ARE_LOADED = True



@app.route('/predict', methods=["POST"])
def predict():
     data = request.get_data()
     predictions = model.predict(data)

     return {"predictions":predictions}


@app.route('/health', methods=["GET"])
def health():
    if not MODELS_ARE_LOADED:
        raise BadRequest("Models are not loaded yet")
    return "OK"

which works fine. The issue is that it seems like the old model is removed before the new model is ready, leading to the endpoint being unavailable for a few minutes.

The check MODELS_ARE_LOADED seems to work locally i.e returns the error message when the models are not loaded, and as far as I understand the endpoint is not considered "ready" before it's healthy.

I would assume that adding a new model the traffic wouldn't go to the new model before it is healthy, or am I wrong here?

0

There are 0 answers