Load Balancer unhealthy and CDK deploy stuck

204 views Asked by At

I have a loadbalancer and a fargate service, which I deploy after another from CDK.

First problem is that the cdk deployment is not going through (see images below).

Second problem which could be the reason for the former is that the health check is failing, although the container health check works and when I set the loadbalancer to be accessible from the internet I could perform my health check and got "OK" with code 200 back.. target health check fails

container is healthy

My steps: 1: loadbalancer

    const relayerLoadBalancer = new elbv2.ApplicationLoadBalancer(
      this,
      "RelayerLoadBalancer",
      {
        vpc: vpc,
        internetFacing: false,
      }
    );

    this.relayerLoadBalancer = relayerLoadBalancer;

2: Taskdefinition and Container from Ecr (see gist)

3: relayer Service with Fargate (also in gist):

    const listener = props.relayerLoadBalancer.addListener("RelayerListener", {
        port: 80,
        protocol: elbv2.ApplicationProtocol.HTTP,
        open: true,
      });
  
      listener.connections.allowFromAnyIpv4(ec2.Port.allTraffic());
  
    const sg_service = new ec2.SecurityGroup(this, "RelayerSG", {
      vpc: vpc,
      allowAllOutbound: true,
      description: "Security group for Relayer tasks",
    });
    sg_service.addIngressRule(
      ec2.Peer.ipv4("0.0.0.0/0"),
      ec2.Port.allTraffic()
    );

    // Create a Fargate service
    const relayerService = new ecs.FargateService(this, "RelayerServiceM", {
      cluster: cluster,
      taskDefinition: props.taskDefinition,
      enableExecuteCommand: true,
      securityGroups: [sg_service],
      assignPublicIp: false,
      healthCheckGracePeriod: Duration.seconds(3600),
    });

    listener.addTargets("RelayerTarget", {
      port: 80,
      targets: [relayerService],
      healthCheck: {
        path: "/health",
        healthyHttpCodes: "200-499",
      },
    });

Ive tried to split the load balancer and fargate service but the cdk deployment is always stuck (for example: step 4/6) in the step where the service is created:

stuck cdk deployment image

Therefore the deployment in the console is also in progress: aws console image

I think maybe the cdk deploy fails because the health check fails but maybe also the health check fail because the deployment doesnt go through. Sounds confusing and I definitely am.

I've set the healthCheckGracePeriod to 5h, which should be more than enough time for the health checks to come through...

I've read something about that the health check must not be performed on the port 80 but since Ive opened up all traffic this shouldn't be a problem right?

1

There are 1 answers

0
moerv9 On BEST ANSWER

After 6h of meetings with aws engineers we figured it out...

Our approach was the following:

  1. Rule out ALB as cause of problem
    I setup an EC2 instance in the same subnet and performed a few connectivity tests to the ALB. I have set up another EC2 instance with an easy webserver on it and connected it to the ALB as the target. Using the following commands I could confirm that the ALB was working perfectly:
  • ping
  • nmap [private IP of target] -p1-65535 -> returns open ports
  • curl -Iv [private IP of target]:80/health -> returns 200 / "OK"
  1. ECS is the problem
    At first I checked all Security Groups, NACLs and the correct paths and ports from container and Health checks. My application uses a .env file where the HOST=localhost and PORT=80 are defined. That is why a curl http://localhost:80/health returned healthy, but the outside health check failed.

    It must be HOST=0.0.0.0 in the .env to be exposed to the outside.