I have a Microservice that is having a Postgres database and I have some CRUD operations that I do with this service. For some of the insert operations, I need to pass on the newly inserted record to another service and I'm using a messaging system in between.
The architecture pattern is like where I run several instances of this Microservice behind a load balancer and every insert is processed by one of the running instance of the Microservice. To implement the transactional outbox, I have a table where I write the intent and I have a simple polling mechanism in the Microservice itself that polls this table every minute to fetch the intent from the outbox table and sends it to a messaging system. Now I have several questions:
If I run multiple instances of this Microservice, then I might end up having to select the same records by these multiple instances and this could result in duplicates, unnecessary resource utilization etc.,
What do I do after publishing the intent to the message broker? Should I write to the outbox table against this record that this message is now successfully sent to the message broker? This scenario sounds like my original problem where I want to write to the database and to the external system in one commit, just in this scenario the order is reversed. So where is the real benefit?
Is there any other simpler alternatives?
When working with distributed systems, asynchronous communications and messaging you should consider delivery guarantees which your app requires. There are basically 3 options:
There last one (which you think you need) is considered impossible in general case (though exactly once semantics for processing can be achieved with some deduplication handling, like Kafka does).
Most of application I've seen use/require "at least once" delivery for business messages, so the subscriber should be able do deduplicate (or handle duplicates in some other way). Basically you will want to include some unique id in message so the handler can maintain collection of processed messages and filter out duplicates.
Personally I would go with just one instance of the outbox processor (i.e. separate service for it or some infrastructural magic to have only one if the service instances to perform the processing) and I would argue that this should be enough in most of the cases (still you will need to have a dedupe logic on the subscriber side but this will basically solve the issues with unnecessary resource utilization).
If you still want to process the outbox from multiple service instances then you have at least 2 options:
SELECT ... FOR UPDATE
, send selected messages, update the processed rows, commit the transaction.TL;DR
"Is there any other simpler alternatives?" - yes, create a single instance dedicated service which will process the outbox. Either way assign unique message ids and implement deduplication on the subscriber side.