I have been creating a new RDS table every day that sometimes reaches about 10 GB in that day. My goal is to dump this table to S3 more efficiently.

My current method of doing this is:

  1. SELECT * FROM table;

  2. Dump those rows to a CSV

  3. Upload that CSV to S3

  4. Delete the CSV file

Is there a way that I am able to dump the RDS table to S3 directly instead of generating that CSV locally and then dumping that to S3?

1 Answers

1
JD D On

Check out the Database Migration Service of Amazon.

https://aws.amazon.com/dms/

I found creating a task for dumping RDS to S3 to be fairly straight forward in the console and this guide should applicable to you even those it discusses Aurora: https://aws.amazon.com/blogs/database/replicate-data-from-amazon-aurora-to-amazon-s3-with-aws-database-migration-service/

I have a lambda that is scheduled to run on a periodic basis that kicks off a DMS task to dump a few tables of a database.

The code for the lambda (or just a manual job) is really simple (Python 3.6):

import boto3
import os

DMS_TASK_ARN = os.environ.get("DMS_TASK_ARN")
MAX_RETRY = int(os.environ.get("MAX_RETRY", 600))
WAIT_TIME = int(os.environ.get("WAIT_TIME", 10))

dms = boto3.client('dms')

def lambda_handler(event, context):
    # Start the replication task
    print(f"Starting replication task {DMS_TASK_ARN}")
    dms.start_replication_task(
        ReplicationTaskArn=DMS_TASK_ARN,
        StartReplicationTaskType='reload-target'
    )