I've been trying to find out more information on having a retry and error queue where my code fails rather than having in-memory retries in my application.
Here's my scenario:
I'm sending a message saying something along the lines of:
"Process the output for this task - It want's you to update this xml file with the contents of this stream".
I have code that writes output to an xml file but it can occasionally fail and need retried as It's possible that another part of my application/person is using the file at that point in time.
What I'm trying to do is say "Whenever the ouput code fails, resend the SQS message that told it to start the output process/send a new one with the same info. Hence retrying a message." Also, once it retries and fails 100 times I want to move it to an error queue.
Does anyone know of any kind of implementation of this? I'm trying to see something that's already been done before I start implementing.
SQS does everthing you want already, without much effort:
Your code should put the message into the queue that says "Process the output for this task - It want's you to update this xml file with the contents of this stream"
Your worker tasks polls the queue and gets the message and begins the work. I like to use windows services for this, but a chron job or scheduled task works as well.
Only if the worker completes the task successfully does it delete the message from the queue - that is the very last thing that the worker job should do before quitting. It doesn't remove the message from the queue when it gets it, it only removes the message if it succeeds in processing it.
If the worker does not complete, and thus the message is still in the queue, then after the visibility timeout expires, the messages will be in the queue again (or more accurately be visible in the queue again) automatically - you don't need to put it back in the queue.
To implement the 'fail after 100 tries', you'll want to setup a 'Dead Letter Queue': http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/SQSDeadLetterQueue.html
You tell it that if a worker task has requested that message 100 times (configurable from 1 to 1000 times) and not successfully processed it, then automatically move the message to the specified dead-letter-queue.
Couldn't be simpler - SQS does all the work for you with a few clicks of the mouse - all you need to do is write the code that puts the original message in the first queue, does the work, and deletes the message from the queue if/when the task completes successfully..