Azure ML - AKS Service deployment unable to handle concurrent requests despite auto scaling enabled

274 views Asked by At

I have deployed around 23 models (amounting to 1.57 GB) in a Azure ML workspace using Azure Kubernetes Service. For the AKS cluster, I have used 3 D8sv3 nodes, and enabled cluster auto scaling for the cluster up to 6 nodes. The AksWebService is configured with 4.4 cores, 16 GB memory. I have enabled pod auto scaling for the Web service, having set autoscale_max_replicas at 40:

aks_config = AksWebservice.deploy_configuration(cpu_cores = 4.4, memory_gb = 16, autoscale_enabled = True,
                                            description = 'TEST - Configuration for Kubernetes Compute Target',
                                            enable_app_insights = True, max_request_wait_time = 25000,
                                            autoscale_target_utilization = 0.6, autoscale_max_replicas = 40)

I tried running load tests with 10 concurrent users (using JMeter). And I monitored the cluster application insights: enter image description here enter image description here

I can see the nodes and pods scaling. However, there is no spike in CPU/memory utilization. For 10 concurrent requests, only 5 to 6 requests pass, the rest fail. When I send an individual request to the deployed endpoint, the response is generated in 7 to 9 seconds. However, in the load test logs, there are plenty requests taking more than 15 seconds to generate a response. And the requests taking more than 25 seconds, fail with status code 503. I increased the max_request_wait_time due to this reason, however, I don't understand why it would take so much time despite such amount of compute, and the dashboard shows that memory isn't even 30% utilized. Should I be changing the replica_max_concurrent_requests param? Or should I be increasing the autoscale_max_replicas even more? Concurrent requests load may sometimes reach 100 in production, is there any solution to this?

Will be grateful for any advice. Thanks.

0

There are 0 answers