Traffic distribution inside an GCP instance group

Question

Traffic distribution inside an GCP instance group

194 views Asked by yellowhat At 24 January 2024 at 15:17

I have a regional instance group with 2 instances serving on port 8000, no autoscaling enabled.

This instance group is the backend of a Global external application load balancer.

The web server on the instances is running flask and each request will take 1 second to process:

@app.route("/", methods=["GET"])
def main() -> Response:
    sleep(1)
    response = jsonify(
        success=True,
        hostname=gethostname(),
    )
    response.status_code = 200
    response.headers["Access-Control-Allow-Origin"] = "*"
    return response

I can confirm the behaviour by hitting the global load balancer public ip, ie takes 1 sec to get a response.

If I run 2 request at the same time, as there are 2 instances, I am expecting that each request will go to a different instance instead they always go to the same instance and they are queued on it, so:

one request will take 1 sec
the other request will take 2 sec

I have tried to change the Locality load balancing policy to Round-Robin, Least-Request and Random, but I always get the same behaviour.

Am I understanding correctly that Locality load balancing policy is only about which backend will be chosen? If yes how do you configure the load balancing policy inside a backend (ie instance group)?

Thanks

Configuration

Instance group

Regional
Target distribution shape: Even
[x] Allow instance redistribution
Autoscaling on: min 2, max 2
Autoscaling signal: HTTP load-balancing 100%
Initialisation period: 60s
Health check:
- Path: /health
- Protocol: HTTP
- Port: 8000
- Interval: 30 sec
- Timeout: 30 sec
- Healthy threshold: 1
- Unhealthy threshold: 10

Load balancer

Frontend:
- Protocol: HTTP
- IP: x.x.x.x
- Network Tier: Premium
- HTTP keepalive timeout: 610 sec
Routing rules: All unmatched
Backend services:
- Endpoint protocol: HTTP
- Named port: web
- Timeout: 300 seconds
- Cloud CDN: Disabled
- Logging: Enabled (sample rate: 1)
- Session affinity: None
- Connection draining timeout: 300 seconds
- Traffic policy:
  - Locality load balancing policy: Round robin
  - Outlier detection: Disabled
- Backend security policy: None
- Edge security policy: None
- Identity-Aware Proxy: Disabled
Balancing mode: Max. RPS: 1 (per instance)
Capacity: 100%

Test

siege \
    --concurrent 1 \
    --time 60s \
    "http://x.x.x.x"

With 2 nodes:

concurrent=1: avg 1.02 sec
concurrent=2: avg 1.66 sec
concurrent=4: avg 3.35 sec
concurrent=8: avg 5.54 sec

With 4 nodes:

concurrent=1: avg 1.02 sec
concurrent=2: avg 1.18 sec
concurrent=4: avg 2.70 sec
concurrent=8: avg 3.83 sec
concurrent=16: avg 7.26 sec

With 8 nodes:

concurrent=2: avg 1.20 sec
concurrent=4: avg 1.85 sec
concurrent=16: avg 4.40 sec
concurrent=64: avg 14.06 sec
concurrent=128: avg 19.04 sec

Expected behaviour

I would have assumed results like:

2 nodes:
- concurrent=1: 1 sec
- concurrent=2: 1 sec
- concurrent=4: ~2 sec
- concurrent=8: ~4 sec
4 nodes:
- concurrent=1: 1 sec
- concurrent=2: 1 sec
- concurrent=4: 1 sec
- concurrent=8: ~2 sec
- concurrent=16: ~4 sec

Update 1

If I switch to a Classic proxy network load balancer and I send 100 requests:

56 go to vm0
44 go to vm1

Instead for an HTTP LB:

99 go to vm0
1 go to vm1

Original Q&A

There are 1 answers

**Dion V** · Answer 1 · 2024-01-29T23:01:31+00:00

It might be the configuration in the load balancing policy Max. RPS: 1 (per instance) setting is causing requests to queue on a single instance instead of distributing them evenly. You can try to change the configuration of the load balancer of Balancing mode to Rate.

You can check if you can modify your Flask application to handle multiple requests concurrently. You can use the concurrent parameter in the siege command. As an example:

siege --concurrent 2 "http://x.x.x.x"

This will send 2 requests concurrently . This may be a workaround to simulate the concurrent requests and does not change the actual behavior of the Flask application.

TechQA.

Traffic distribution inside an GCP instance group

Configuration

Instance group

Load balancer

Test

Expected behaviour

Update 1

There are 1 answers

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in GOOGLE-COMPUTE-ENGINE

Related Questions in LOAD-BALANCING

Related Questions in GOOGLE-CLOUD-LOAD-BALANCER

Related Questions in GCE-INSTANCE-GROUP

Popular Questions

Trending Questions