Firstly thank you to whomever can answer this. It's my first post here so I'll try to be as clear as possible.
I have an ECS cluster with 2 fargate tasks configured. The ECS is setup in a VPC assigned to private subnets (NAT attached) and these tasks also have a public ip (I believe this should not be the case?)
one of the tasks is basically an API and the other is the front end for a webpage.
On top of this we have an ALB that is internet facing. This is assigned to public subnets (with an IGW)
I decided to explore using NACLS and for some reason only this rule works(obviously):
Inbound:
Rule Number | Type | Protocol | Port Range | Source | Allow/Deny |
---|---|---|---|---|---|
1 | All Traffic | All | All | 0.0.0.0/0 | Allow |
Outbound:
Rule Number | Type | Protocol | Port Range | Destination | Allow/Deny |
---|---|---|---|---|---|
1 | All Traffic | All | All | 0.0.0.0/0 | Allow |
I want to give explicit rules for the public and private subnets. I've created 2 NACL groups and associated subnets with them.
For the public subnets which is for the internet facing load balancer. Here are my rules:
Public NACL:
Inbound:
Rule Number | Type | Protocol | Port Range | Source | Allow/Deny |
---|---|---|---|---|---|
100 | HTTPS | TCP(6) | 443 | 0.0.0.0/0 | Allow |
110 | Custom | TCP(6) | 1024-65535 | 1x.xx.xx.xx/16 | Allow |
Outbound:
Rule Number | Type | Protocol | Port Range | Destination | Allow/Deny |
---|---|---|---|---|---|
100 | HTTPS | TCP(6) | 443 | 1x.xx.xx.xx/16 | Allow |
110 | Custom | TCP(6) | 3003 | 1x.xx.xx.xx/16 | Allow |
120 | Custom | TCP(6) | 1024-65535 | 0.0.0.0/0 | Allow |
In the Private Subnet the NACL rules are as follows:
Inbound:
Rule Number | Type | Protocol | Port Range | Source | Allow/Deny |
---|---|---|---|---|---|
100 | HTTPS | TCP(6) | 443 | 0.0.0.0/0 | Allow |
110 | Custom | TCP(6) | 1024-65535 | 1x.xx.xx.xx/16 | Allow |
Outbound:
Rule Number | Type | Protocol | Port Range | Destination | Allow/Deny |
---|---|---|---|---|---|
100 | HTTPS | TCP(6) | 443 | 1x.xx.xx.xx/16 | Allow |
110 | SMTP | TCP(6) | 25 | 0.0.0.0/0 | Allow |
120 | SMTPS | TCP(6) | 465 | 0.0.0.0/0 | Allow |
130 | Custom | TCP(6) | 1024-65535 | 0.0.0.0/0 | Allow |
The service will run for a while without any issues and my website will function. However on occasion I get a 504 error and when I go back to my ECS. I see the tasks reverting to PROVISIONING ->PENDING.... and after about 5-10 mins STOPPED. The error I get is this :
ResourceInitializationError: unable to pull secrets or registry auth: pull command failed: : signal: killed NACL
The only way to fix it is to revert to the default allow all inbound outbound rules.
Any idea what is causing this? could it be the public ip address assigned to the fargate tasks that is causing a conflict? The ECR repository is on another aws account. I believe the IAM role permissions are correct otherwise it would not be able to pull the image even with the default NACL rules. Appreciate any help as I am at a loss. Thank you