How can we make asynchronous requests to Sagemaker endpoints

1k views Asked by At

I am a little confused by the documentation provided by Amazon's SageMaker.

I am trying to make an asynchronous request to a Sagemaker endpoint. By "asynchronous" I am referring to being able to utilise the Python asyncio event loop so I can make multiple requests to the endpoint in a non-blocking way.

I am not referring to this feature that Amazon has listed here whereby if you expect that inference time may take a while or you have a large payload (up to 1 hour or 1 gb respectively) then Amazon will process your request and will then store your result in S3 for you to use it later.

I have found this AsyncPredictor class but there aren't any examples of its usage, but from what I can tell this isn't the type of async I am looking for.

Throughout my search of the documentation and the sagemaker sdk github repo I couldn't find any mention of being able to make.

Does the Sagmemaker SDK support this or not? If not, is the only way I can make async invokations of the sagemaker endpoint through manually creating async sessions using something like the aiobotocore library?

1

There are 1 answers

0
Ram Vegiraju On

Asynchronous Inference with SageMaker is a specific SageMaker Hosting feature this is not related to the asyncio Python class. Async Inference enables you to queue requests and pass in an input S3 data location, to achieve near real-time inference with SageMaker endpoints. Here you also have other features such as scaling based off of the "BackLogSize" which is your queue and also being able to scale down to 0 instances.

For reference here's a starter blog: https://aws.amazon.com/blogs/machine-learning/run-computer-vision-inference-on-large-videos-with-amazon-sagemaker-asynchronous-endpoints/