Experts: I have a situation where I need to transfer incremental data( every 5 minutes ) & daily data from an application database that has about 500+ tables to S3 for a lake house implementation. The data volumes for 5 minute interval is less than 0.5 million records. In the current world, there is SQL Server CDC that copies the data to another SQL ODS and gets into 2 different Data marts that's being used for Operational reporting. Need your expertise to answer below questions
- If we choose AWS Glue to transfer data to S3, do I need to write 500+ glue jobs one for each table? Is this right way of doing ? Are there any other tools or technologies that can transfer data easily.
- If we had to do both incremental ( every 5 minute ) and also batch ( hourly/daily ), can the same jobs be used? if yes, where and how to configure the time period for extraction?
- If more tables or columns get added in the source database , do I need to keep writing additional jobs or can I write a template job and call with parameters? 4.Are there any other tools ( apart from Glue ) and AWS cloud watch to monitor delays, failures & long running jobs
This use case can be solved by using AWS Database Migration Service. AWS Database Migration Service (AWS DMS) is a cloud service that makes it easy to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. You can use AWS DMS to migrate your data into the AWS Cloud or between combinations of cloud and on-premises setups.
Look at the doc for more information.
AWS Database Migration Service User Guide