Traffic distribution inside an GCP instance group

194 views Asked by At

I have a regional instance group with 2 instances serving on port 8000, no autoscaling enabled.

This instance group is the backend of a Global external application load balancer.

The web server on the instances is running flask and each request will take 1 second to process:

@app.route("/", methods=["GET"])
def main() -> Response:
    sleep(1)
    response = jsonify(
        success=True,
        hostname=gethostname(),
    )
    response.status_code = 200
    response.headers["Access-Control-Allow-Origin"] = "*"
    return response

I can confirm the behaviour by hitting the global load balancer public ip, ie takes 1 sec to get a response.

If I run 2 request at the same time, as there are 2 instances, I am expecting that each request will go to a different instance instead they always go to the same instance and they are queued on it, so:

  • one request will take 1 sec
  • the other request will take 2 sec

I have tried to change the Locality load balancing policy to Round-Robin, Least-Request and Random, but I always get the same behaviour.

Am I understanding correctly that Locality load balancing policy is only about which backend will be chosen? If yes how do you configure the load balancing policy inside a backend (ie instance group)?

Thanks

Configuration

Instance group

  • Regional
  • Target distribution shape: Even
  • [x] Allow instance redistribution
  • Autoscaling on: min 2, max 2
  • Autoscaling signal: HTTP load-balancing 100%
  • Initialisation period: 60s
  • Health check:
    • Path: /health
    • Protocol: HTTP
    • Port: 8000
    • Interval: 30 sec
    • Timeout: 30 sec
    • Healthy threshold: 1
    • Unhealthy threshold: 10

Load balancer

  • Frontend:
    • Protocol: HTTP
    • IP: x.x.x.x
    • Network Tier: Premium
    • HTTP keepalive timeout: 610 sec
  • Routing rules: All unmatched
  • Backend services:
    • Endpoint protocol: HTTP
    • Named port: web
    • Timeout: 300 seconds
    • Cloud CDN: Disabled
    • Logging: Enabled (sample rate: 1)
    • Session affinity: None
    • Connection draining timeout: 300 seconds
    • Traffic policy:
      • Locality load balancing policy: Round robin
      • Outlier detection: Disabled
    • Backend security policy: None
    • Edge security policy: None
    • Identity-Aware Proxy: Disabled
  • Balancing mode: Max. RPS: 1 (per instance)
  • Capacity: 100%

Test

siege \
    --concurrent 1 \
    --time 60s \
    "http://x.x.x.x"

With 2 nodes:

  • concurrent=1: avg 1.02 sec
  • concurrent=2: avg 1.66 sec
  • concurrent=4: avg 3.35 sec
  • concurrent=8: avg 5.54 sec

With 4 nodes:

  • concurrent=1: avg 1.02 sec
  • concurrent=2: avg 1.18 sec
  • concurrent=4: avg 2.70 sec
  • concurrent=8: avg 3.83 sec
  • concurrent=16: avg 7.26 sec

With 8 nodes:

  • concurrent=2: avg 1.20 sec
  • concurrent=4: avg 1.85 sec
  • concurrent=16: avg 4.40 sec
  • concurrent=64: avg 14.06 sec
  • concurrent=128: avg 19.04 sec

Expected behaviour

I would have assumed results like:

  • 2 nodes:
    • concurrent=1: 1 sec
    • concurrent=2: 1 sec
    • concurrent=4: ~2 sec
    • concurrent=8: ~4 sec
  • 4 nodes:
    • concurrent=1: 1 sec
    • concurrent=2: 1 sec
    • concurrent=4: 1 sec
    • concurrent=8: ~2 sec
    • concurrent=16: ~4 sec

Update 1

If I switch to a Classic proxy network load balancer and I send 100 requests:

  • 56 go to vm0
  • 44 go to vm1

Instead for an HTTP LB:

  • 99 go to vm0
  • 1 go to vm1
1

There are 1 answers

1
Dion V On

It might be the configuration in the load balancing policy Max. RPS: 1 (per instance) setting is causing requests to queue on a single instance instead of distributing them evenly. You can try to change the configuration of the load balancer of Balancing mode to Rate.

You can check if you can modify your Flask application to handle multiple requests concurrently. You can use the concurrent parameter in the siege command. As an example:

siege --concurrent 2 "http://x.x.x.x"

This will send 2 requests concurrently . This may be a workaround to simulate the concurrent requests and does not change the actual behavior of the Flask application.