Amazon AWS, messages from SQS queue delivered multiple times

10.2k views Asked by At

I have a worker running on Elastic Beanstalk which accepts POST requests via messages from queue. These messages triggers long operation which takes several minutes (sometimes even hours) and it is crucial that this operation is executed only once.

The problem is that when I log in to the worker console to see the process, the message seems to be delivered each minute over and over again (the method triggered by receiving the requests gets called each minute). How can I get rid of this behavior?

I read the documentation and set the Visible timeout period to the max value (12 hours) for both the service queue and the dead letter queue. This does, however, not help at all.

When I send the message, it is displayed as "in flight" (which is a supposed behavior, I think, since the queue waits to receive a delete request or some kind of answer which is only provided at the end of the long operation).

Could someone hint me what is going on in this scenario? I probably missed some important detail in the configuration...

EDIT: it seems that the message is being redelivered each minutes as long as it is "in flight". Once I finish the process, the message finally disappears.

3

There are 3 answers

0
David Murray On BEST ANSWER

There's an extra layer of complexity here because you're not polling the SQS queue directly; there's a worker process deployed by Elastic Beanstalk called sqsd that's polling the queue on your behalf, POSTing any messages it gets to your application, and deleting them from the queue when you respond with a 200.

The VisibilityTimeout setting on the queue controls how long the queue waits after delivering a message to the consumer (in this case, sqsd) before it assumes something has gone wrong and re-delivers the message to someone else. sqsd has a similar concept (called "InactivityTimeout") that controls how long it waits after POSTing to your application before it assumes something has gone wrong and retries. You'll need to configure this to also be high enough that sqsd doesn't re-send the request to your application before you finish processing it. I've seen reports of another "ProxyTimeout" setting that might need to be adjusted as well.

More generally, keep in mind that exactly-once delivery isn't physically possible to guarantee in a distributed system - even if you get all the timeouts right so it works correctly most of the time, there's always the possibility that you'll crash after completing the operation but before you can tell SQS about it, and the message will be re-delivered to someone else. The closest you can get is to make sure that if a message gets delivered twice, that the result is exactly the same - for example, by having your processing logic check whether the thing it's about to do has already been done, and if so just immediately returning a 200.

2
Daniel777 On

It seems like you forgot to delete the message after processing it.

After you dequeue a message, it is necessary to delete it. If you don't delete it explicitly, SQS assumes that you dequeued the message and failed to process it, so it will appear on the queue again.

There are 2 parameters of timeout that you can set in SQS and both are important:

  1. WaitTimeSeconds

  2. VisibilityTimeout

1) WaitTimeSeconds = 10 means that your call to SQS should return immediately if there are messages in the queue, BUT if there are no messages in the queue, your call will block until a message arrives to the queue, with a maximum of 10 seconds.

2) Once you have dequeued a message, VisibilityTimeout = 60 states that you have 60 seconds to process that message, otherwise it will appear again in the queue. If you processed that message before 60 seconds, you MUST send a deleteMessage request. If you fail to send that deleteMessage request before 60 seconds, the message will reappear in the queue.

If you send the deleteMessage request after 60 seconds, it will have no effect and the message will reappear anyway.

You have to write your code in a way that if your process fails, it will naturally fail to send the deleteMessage request, so that the message will naturally appear again in SQS.

You can find detailed info about 1) and 2) here:

http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/MessageLifecycle.html

http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html

http://boto.readthedocs.org/en/latest/ref/sqs.html#boto.sqs.queue.Queue.get_messages

3
Rohit On

With sqs you have to manually call the delete api to remove the message off the queue. Setting a high timeout value only ensures that no other poller will receive the same message for that amount of time.

You have 2 options. 1. Delete the message as soon as you read it and then start the downstream process. 2. Read the message, set the visibility timeout of the message to the timeout value of your process and then as part of your process, last step to do is to delete the message.