I have a loadbalancer and a fargate service, which I deploy after another from CDK.
First problem is that the cdk deployment is not going through (see images below).
Second problem which could be the reason for the former is that the health check is failing, although the container health check works and when I set the loadbalancer to be accessible from the internet I could perform my health check and got "OK" with code 200 back.. target health check fails
My steps: 1: loadbalancer
const relayerLoadBalancer = new elbv2.ApplicationLoadBalancer(
this,
"RelayerLoadBalancer",
{
vpc: vpc,
internetFacing: false,
}
);
this.relayerLoadBalancer = relayerLoadBalancer;
2: Taskdefinition and Container from Ecr (see gist)
3: relayer Service with Fargate (also in gist):
const listener = props.relayerLoadBalancer.addListener("RelayerListener", {
port: 80,
protocol: elbv2.ApplicationProtocol.HTTP,
open: true,
});
listener.connections.allowFromAnyIpv4(ec2.Port.allTraffic());
const sg_service = new ec2.SecurityGroup(this, "RelayerSG", {
vpc: vpc,
allowAllOutbound: true,
description: "Security group for Relayer tasks",
});
sg_service.addIngressRule(
ec2.Peer.ipv4("0.0.0.0/0"),
ec2.Port.allTraffic()
);
// Create a Fargate service
const relayerService = new ecs.FargateService(this, "RelayerServiceM", {
cluster: cluster,
taskDefinition: props.taskDefinition,
enableExecuteCommand: true,
securityGroups: [sg_service],
assignPublicIp: false,
healthCheckGracePeriod: Duration.seconds(3600),
});
listener.addTargets("RelayerTarget", {
port: 80,
targets: [relayerService],
healthCheck: {
path: "/health",
healthyHttpCodes: "200-499",
},
});
Ive tried to split the load balancer and fargate service but the cdk deployment is always stuck (for example: step 4/6) in the step where the service is created:
Therefore the deployment in the console is also in progress: aws console image
I think maybe the cdk deploy fails because the health check fails but maybe also the health check fail because the deployment doesnt go through. Sounds confusing and I definitely am.
I've set the healthCheckGracePeriod to 5h, which should be more than enough time for the health checks to come through...
I've read something about that the health check must not be performed on the port 80 but since Ive opened up all traffic this shouldn't be a problem right?
After 6h of meetings with aws engineers we figured it out...
Our approach was the following:
I setup an EC2 instance in the same subnet and performed a few connectivity tests to the ALB. I have set up another EC2 instance with an easy webserver on it and connected it to the ALB as the target. Using the following commands I could confirm that the ALB was working perfectly:
ECS is the problem
At first I checked all Security Groups, NACLs and the correct paths and ports from container and Health checks. My application uses a .env file where the HOST=localhost and PORT=80 are defined. That is why a curl http://localhost:80/health returned healthy, but the outside health check failed.
It must be HOST=0.0.0.0 in the .env to be exposed to the outside.