Data Migration to Amazon S3 buckets

915 views Asked by At

Experts: I have a situation where I need to transfer incremental data( every 5 minutes ) & daily data from an application database that has about 500+ tables to S3 for a lake house implementation. The data volumes for 5 minute interval is less than 0.5 million records. In the current world, there is SQL Server CDC that copies the data to another SQL ODS and gets into 2 different Data marts that's being used for Operational reporting. Need your expertise to answer below questions

  1. If we choose AWS Glue to transfer data to S3, do I need to write 500+ glue jobs one for each table? Is this right way of doing ? Are there any other tools or technologies that can transfer data easily.
  2. If we had to do both incremental ( every 5 minute ) and also batch ( hourly/daily ), can the same jobs be used? if yes, where and how to configure the time period for extraction?
  3. If more tables or columns get added in the source database , do I need to keep writing additional jobs or can I write a template job and call with parameters? 4.Are there any other tools ( apart from Glue ) and AWS cloud watch to monitor delays, failures & long running jobs
2

There are 2 answers

1
smac2020 On

This use case can be solved by using AWS Database Migration Service. AWS Database Migration Service (AWS DMS) is a cloud service that makes it easy to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. You can use AWS DMS to migrate your data into the AWS Cloud or between combinations of cloud and on-premises setups.

Look at the doc for more information.

AWS Database Migration Service User Guide

0
Rajasekar On

You can use AWS DMS to migrate data to s3 target. DMS also supports CDC. Whieh means it can also sync changes post initial migration.

To transfer data for example from on-prem to cloud, you need to have a replication instance. This can be any tier based on the size of data transfer.

Then a replication task has to be created. This can be execute immediately, or scheduled run at periodic intervals.