ECS: Cluster is not starting one-off tasks on EC2 instances

195 views Asked by At

We have a particular cluster that we've only ever used for one-off tasks that are started via schedule on Fargate. We've clone one of our Fargate task-definitions to be EC2 (and then removed 'network mode', relying on the default) and then redeployed one of our scheduled-tasks to use it. However, its tasks are no longer being started. There are no immediately-failed tasks. Since it's not a 'service' there is no events-log to audit. Typically, we have our own task-tracking implemented by a lambda function that records when all containers are started and stopped, and there's nothing. We updated the new task-definition to use Fargate, and the tasks begin starting, so it's not a [obvious] flaw in the task-definition itself.

Note that we currently have two Fargate-specific capacity-providers and the only non-Fargate capacity-providers are autoscaling related (whereas we're talking about one-off tasks, not services where ASGs are relevant). I suspect this has something to do with it but I have no idea what to do about it:

capacity providers

I know that EC2 and ECS tasks are supposed to be able to run on the same cluster concurrently, but this imbalance in the capacity providers doesn't make sense to me. Would I be able to delete these capacity providers without affecting our current Fargate tasks?

The new ECS instance:

ECS instances

1

There are 1 answers

19
Mark B On

Note that we currently have two Fargate-specific capacity-providers and the only non-Fargate capacity-providers are autoscaling related (whereas we're talking about one-off tasks, not services where ASGs are relevant).

This is the problem. First, ASGs are relevant for any ECS tasks that need to run, not just ECS services. You have to have an EC2 capacity provider for ECS to create an EC2 instance when one is needed for an ECS task to run on. This is not ECS Service specific at all. ECS Services have ECS Application auto-scaling, which is a totally separate thing from Capacity Provider EC2 auto-scaling. ECS Services will auto-scale based on some metric like number of inbound requests, CPU usage, etc. If/when the number of EC2 instances in the cluster is not sufficient to run the number of ECS tasks that ECS is trying to run, then the EC2 Capacity Provider kicks in to scale out the number of EC2 instances (using the EC2 ASG).

So the solution to your issue is to add an EC2 Capacity Provider to the ECS cluster.


I know that EC2 and ECS tasks are supposed to be able to run on the same cluster concurrently, but this imbalance in the capacity providers doesn't make sense to me.

For tasks that request to run on Fargate, the Fargate capacity provider will be used. For tasks that request to run on EC2, the EC2 capacity provider will be used. There is no imbalance here.

Would I be able to delete these capacity providers without affecting our current Fargate tasks?

No. If you do that, then all Fargate tasks in the cluster will stop running. You simply need to add an EC2 capacity provider to the cluster, and leave the Fargate capacity providers as they are.