How to solve batch processing in ESB?

4.8k views Asked by At

We have a legacy system that produces files that each contains hundreds of messages (financial transactions). We need to transform these messages into another format and submit them (individually) to a target system. The question is: Should ESB accept these files for processing directly, or should there be an adapter application between the legacy system and ESB that would split received files into individual messages and let the ESB to process the messages individually (instead of processing the whole file)?

In the first solution we expect two ESB flows. The first one would transform the file into a new format, split it into the messages, and store these messages into a temporary location. The transformation needs to process the file as a whole, because the file contains some common sections that are needed for transformation of the individual messages. The second flow would take the individual transformed messages (each in a separate DB transaction), pass them to the target system, and wait for its answer (synchronously or asynchronously).

The second solution would replace the first flow by an external application that would transform the file, split it into individual transformed messages, and store them in a temporary location (local file system). The second flow would stay in the ESB.

In our eyes, the disadvantage of the first solution is in that the ESB would have to process huge files (in the first flow), which is commonly considered an antipattern. On the other hand, the ESB would adjust directly to the interface of the legacy system, which is one of the purposes of ESB.

In the second solution, the adapter application would contain the transformation logic, which should be another of the purposes and responsibilities of ESB.

What is the commonly suggested solution for this situation (a pattern)? Could you provide some references that are more descriptive than these two links that I've found?

http://publib.boulder.ibm.com/infocenter/esbsoa/wesbv7r5/index.jsp?topic=%2Fcom.ibm.websphere.wesb.programming.doc%2Ftopics%2Fesbprog_patterns.html

https://www.ibm.com/developerworks/wikis/display/esbpatterns/File+Processing

Edit Another reference: http://www.ibm.com/developerworks/webservices/library/ws-largemessaging/

2

There are 2 answers

5
Eben Roux On

Remember that there are 3 message types in SOA: Command, Event, Document

That 'Document' bit is for chunks of data. It is probably better suited to 'real' document types such as 'Order' or 'Invoice' and the like but there is nothing stopping you from going with 'TransactionBatch'.

That being said, it is a rather unused message type in that not many service buses actually implement anything around it, since:

  • you do not really need it
  • many message queuing technologies have limits on message size (as low as 4kb) making it difficult to transport any large message (needs to be sent in chunks)

So what I would do in your scenario is have an endpoint that processes the file. So something like a ProcessTransactionFileCommand sent to the processing endpoint and in it you only have a reference to the actual file (stored somewhere in the file system or even a url to download from). That processing endpoint can process the file and send the individual messages (all within a transaction) to the integration endpoint that sends the message off to the external system. You could have a SendTransactionCommand to do that.

In this way your system is very flexible in that the integration endpoint can receive individual integration commands form some parts of your solution while the processing endpoint can handle the batch and split them into individual integration commands.

Should you be in the .NET space you may want to look at my FOSS service bus project: http://shuttle.codeplex.com/

But any service bus will do the trick (MassTransit, NServiceBus, etc.)

0
Pablo La Greca On

You can use an ESB for the first case and I don't think it would be an anti-pattern. The purpose of an ESB it's also to integrate legacy applications, that create files as output as in your use case, with other applications.

You can try Mule ESB. It will allow you to consume the file using streaming (through the file transport), map the content of the file to your desired output using a GUI called DataMapper and finally put those messages en a VM queue which can be a persistent queue within the ESB. This queues are transactional so you can guarantee that all the messages created from one file were put on the VM queue or none of them. Then you can, from another flow (in fact processed within the ESB are called flows in mule) read each of those messages and process them.

HTH, Pablo.