ECR Pulling failing for Fargate Tasks in Private Subnet with Public IP

1.7k views Asked by At

Firstly thank you to whomever can answer this. It's my first post here so I'll try to be as clear as possible.

I have an ECS cluster with 2 fargate tasks configured. The ECS is setup in a VPC assigned to private subnets (NAT attached) and these tasks also have a public ip (I believe this should not be the case?)

one of the tasks is basically an API and the other is the front end for a webpage.

On top of this we have an ALB that is internet facing. This is assigned to public subnets (with an IGW)

I decided to explore using NACLS and for some reason only this rule works(obviously):

Inbound:

Rule Number Type Protocol Port Range Source Allow/Deny
1 All Traffic All All 0.0.0.0/0 Allow

Outbound:

Rule Number Type Protocol Port Range Destination Allow/Deny
1 All Traffic All All 0.0.0.0/0 Allow

I want to give explicit rules for the public and private subnets. I've created 2 NACL groups and associated subnets with them.

For the public subnets which is for the internet facing load balancer. Here are my rules:

Public NACL:

Inbound:

Rule Number Type Protocol Port Range Source Allow/Deny
100 HTTPS TCP(6) 443 0.0.0.0/0 Allow
110 Custom TCP(6) 1024-65535 1x.xx.xx.xx/16 Allow

Outbound:

Rule Number Type Protocol Port Range Destination Allow/Deny
100 HTTPS TCP(6) 443 1x.xx.xx.xx/16 Allow
110 Custom TCP(6) 3003 1x.xx.xx.xx/16 Allow
120 Custom TCP(6) 1024-65535 0.0.0.0/0 Allow

In the Private Subnet the NACL rules are as follows:

Inbound:

Rule Number Type Protocol Port Range Source Allow/Deny
100 HTTPS TCP(6) 443 0.0.0.0/0 Allow
110 Custom TCP(6) 1024-65535 1x.xx.xx.xx/16 Allow

Outbound:

Rule Number Type Protocol Port Range Destination Allow/Deny
100 HTTPS TCP(6) 443 1x.xx.xx.xx/16 Allow
110 SMTP TCP(6) 25 0.0.0.0/0 Allow
120 SMTPS TCP(6) 465 0.0.0.0/0 Allow
130 Custom TCP(6) 1024-65535 0.0.0.0/0 Allow

The service will run for a while without any issues and my website will function. However on occasion I get a 504 error and when I go back to my ECS. I see the tasks reverting to PROVISIONING ->PENDING.... and after about 5-10 mins STOPPED. The error I get is this :

ResourceInitializationError: unable to pull secrets or registry auth: pull command failed: : signal: killed NACL

The only way to fix it is to revert to the default allow all inbound outbound rules.

Any idea what is causing this? could it be the public ip address assigned to the fargate tasks that is causing a conflict? The ECR repository is on another aws account. I believe the IAM role permissions are correct otherwise it would not be able to pull the image even with the default NACL rules. Appreciate any help as I am at a loss. Thank you

0

There are 0 answers