How to solve 5xx errors in GCP AI Prediction Platform?

492 views Asked by At

We have been able to deploy models (both custom prediction and Tensorflow SavedModel formats) to AI Prediction Platform, and basic testing shows things are at least functional for online predictions. We are now trying to load test a bit before putting this in production, and dealing with some stability issues.

We are seeing a variety of errors - 429 - "Rate of traffic exceeds serving capacity. Decrease your traffic or reduce the size of your model" 503 - "upstream connect error or disconnect/reset before headers. reset reason: connection failure" 504 - "Timed out waiting for notification."

We've implemented an exponential backoff approach, and that generally works to resolve the above issues, over time. However, we want to make sure we understand what's going on.

The 429s seem straightforward - wait for things to scale.

The 503 / 504 errors, we're not sure what the cause is, and how to resolve / eliminate. We have played with batch size (as per TensorFlow model serving on Google AI Platform online prediction too slow with instance batches - it appears that it doesn't make any internal optimizations for larger batches), machine size, etc. Not sure if it's a resource issue, though we see these errors with small batch sizes (instance count).

Anybody else experiencing these issues? Any best practices to suggest? Thanks!

0

There are 0 answers