I need to backup 6 DynamoDB tables every couple of hours. I've created 6 pipeliness from templates and it ran great, except that it created 6 or more virtual machines which were mostly staying up. That's not the economy I can afford.
Does anyone have experience optimizing this kind of scenario?
Some solutions that come to mind are:
One: To ensure that EC2 resources are being terminated, you can set the terminateAfter property on the EC2 resource definition. The semantics of terminate after are discussed here - How does AWS Data Pipeline run an EC2 instance?.
Two: This thread on the AWS forum discusses how existing EC2 instance may be used by data pipeline.
Three: Using the backup pipeline template always creates a single pipeline with a single Activity for the backup that reads from a single source and writes to a single destination. You can view the JSON source of the pipeline in the AWS console and write a similar pipeline with multiple Activity instances - one for each table you want to backup. Since the pipeline definition will only have one EMR resource, only that EMR resource will do the work of all the activities.