I have ran into an issue with my Fargate services scaling down. The service auto-scaling is based on SQS message age ,so it scales down when there are no old messages.
Problem is that some of the containers that are being terminated are still processing SQS messages at that point resulting in a stuck SQS FIFO queue until visibilityTimeout for the message passes.
From this redis thread I assume that there was no such option a year ago. Is it still the case or maybe has found a good workaround?
I am using Celery (Python) to consume SQS messages. Since those are long running tasks (1-2 minutes) I have quite a big visibility Timeout which creates a great lag if there is a case where a running container has been cut off.
I can't use lambda functions because the container size is more than 1GB
Any help would be appreciated :)