I have a worker running on Elastic Beanstalk which accepts POST requests via messages from queue. These messages triggers long operation which takes several minutes (sometimes even hours) and it is crucial that this operation is executed only once.
The problem is that when I log in to the worker console to see the process, the message seems to be delivered each minute over and over again (the method triggered by receiving the requests gets called each minute). How can I get rid of this behavior?
I read the documentation and set the Visible timeout period to the max value (12 hours) for both the service queue and the dead letter queue. This does, however, not help at all.
When I send the message, it is displayed as "in flight" (which is a supposed behavior, I think, since the queue waits to receive a delete request or some kind of answer which is only provided at the end of the long operation).
Could someone hint me what is going on in this scenario? I probably missed some important detail in the configuration...
EDIT: it seems that the message is being redelivered each minutes as long as it is "in flight". Once I finish the process, the message finally disappears.
There's an extra layer of complexity here because you're not polling the SQS queue directly; there's a worker process deployed by Elastic Beanstalk called
sqsd
that's polling the queue on your behalf, POSTing any messages it gets to your application, and deleting them from the queue when you respond with a 200.The VisibilityTimeout setting on the queue controls how long the queue waits after delivering a message to the consumer (in this case, sqsd) before it assumes something has gone wrong and re-delivers the message to someone else. sqsd has a similar concept (called "InactivityTimeout") that controls how long it waits after POSTing to your application before it assumes something has gone wrong and retries. You'll need to configure this to also be high enough that sqsd doesn't re-send the request to your application before you finish processing it. I've seen reports of another "ProxyTimeout" setting that might need to be adjusted as well.
More generally, keep in mind that exactly-once delivery isn't physically possible to guarantee in a distributed system - even if you get all the timeouts right so it works correctly most of the time, there's always the possibility that you'll crash after completing the operation but before you can tell SQS about it, and the message will be re-delivered to someone else. The closest you can get is to make sure that if a message gets delivered twice, that the result is exactly the same - for example, by having your processing logic check whether the thing it's about to do has already been done, and if so just immediately returning a 200.