PROBLEM DESCRIPTION
Hello, I would like to implement a service that receives data from various providers and dumps it into a database (a sort of raw data store).
The issue is that the providers have different ways to give me the data I need. Some stream the data with a RabbitMQ exchange, others give me access to an API I can pull form, and others simply share csv files.
I believe this is a common scenario for those who need a lot of data and have different providers.
QUESTION
Can somebody point me to the right steps I should take in the designing phase of this data ingestion pipeline in order to make it as scalable and maintainable as possible? Maybe some known design patterns or anything that can come in handy for these fairly common scenarios.
One approach of many might be:
What came first into my mind was an integration framework like Apache Camel or Spring Integration. To make it practical, implement for each data source a route (RabbitMQ consumer, file consumer, http producer, etc.) and shape/map/transform the data afterwards so they can be stored into the database which is the last step in your route. That's quite easy this way.