We receive many large data files daily in a variety of formats (i.e. CSV, Excel, XML, etc.). In order to process these large files we transform the incoming data into one of our standard 'collection' message classes (using XSLT and a pipeline component - either built-in or custom), disassemble the large transformed message into individual 'object' messages and then call a series of SOAP web service methods to handle business logic and database operations.
Unlike other files received, the latest file will contain all data rows each day and therefore, we have to handle the differences to prevent identical records from being re-processed each day.
I have a suitable mechanism for handling inserts and updates but am currently struggling with the deletes (where the record exists in the database but not in the latest file).
My current thought process is to flag the deleted records in the database using a 'cleanup' task at the end of the entire process but this would require a method to be called once all 'object' messages from the disassembled file have completed.
Is it possible to monitor individual messages from a multi-record file and call a method on completion of the whole file? Currently, all research is pointing to an orchestration with some sort of 'wait' but is this the only option?
Example: File contains 100 vehicle records. This is disassembled into 100 individual XML messages which are processed using 100 calls to a web service method. Wish to call cleanup operation when all 100 messages are complete.
The component that does the de-batching would need to publish a start of batch message with the number of individual records and a correlation key.
The components that do the insert & update would need to publish a completion message with the same correlation key when it is completed processing.
The start of batch message would have spun up an Orchestration that would would listen for the completion messages with that correlation key and count the number, and either after it has received the correct number or after a timeout period it would call the cleanup or raise an exception.